Machine Learning Paradigms

(1)

Aristomenis S. Lampropoulos George A. Tsihrintzis

Machine Learning

Paradigms

Applications in Recommender Systems

(2)

Intelligent Systems Reference Library

Volume 92

Series editors

Janusz Kacprzyk, Polish Academy of Sciences, Warsaw, Poland e-mail: kacprzyk@ibspan.waw.pl

Lakhmi C. Jain, University of Canberra, Canberra, Australia, and University of South Australia, Adelaide, Australia

e-mail: Lakhmi.Jain@unisa.edu.au

(3)

The aim of this series is to publish a Reference Library, including novel advances and developments in all aspects of Intelligent Systems in an easily accessible and well structured form. The series includes reference works, handbooks, compendia, textbooks, well-structured monographs, dictionaries, and encyclopedias. It contains well integrated knowledge and current information in the ﬁeld of Intelligent Systems. The series covers the theory, applications, and design methods of Intelligent Systems. Virtually all disciplines such as engineering, computer science, avionics, business, e-commerce, environment, healthcare, physics and life science are included.

More information about this series at http://www.springer.com/series/8578

(4)

Aristomenis S. Lampropoulos George A. Tsihrintzis

Machine Learning Paradigms

Applications in Recommender Systems

123

(5)

Department of Informatics University of Piraeus Piraeus

Greece

Department of Informatics University of Piraeus Piraeus

Greece

ISSN 1868-4394 ISSN 1868-4408 (electronic) Intelligent Systems Reference Library

ISBN 978-3-319-19134-8 ISBN 978-3-319-19135-5 (eBook) DOI 10.1007/978-3-319-19135-5

Library of Congress Control Number: 2015940994 Springer Cham Heidelberg New York Dordrecht London

©Springer International Publishing Switzerland 2015

This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, speciﬁcally the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microﬁlms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed.

The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a speciﬁc statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.

The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made.

Printed on acid-free paper

Springer International Publishing AG Switzerland is part of Springer Science+Business Media (www.springer.com)

(6)

To my beloved family and friends

Aristomenis S. Lampropoulos

To my wife and colleague, Prof.-Dr. Maria Virvou, and our daughters, Evina,

Konstantina and Andreani

George A. Tsihrintzis

(7)

Recent advances in Information and Communication Technologies (ICT) have increased the computational power of computers, while at the same time, various mobile devices are embedded in them. The combination of the two leads to an enormous increase in the extent and complexity of data generation, storage, and sharing. “Big data”is the term commonly used to describe data so extensive and complex that they may overwhelm their user, overload him/her with information, and eventually, frustrate him/her. YouTube for example, has more than 1 billion unique visitors each month, uploading 72 hours of video every minute! It would be extremely difﬁcult for a user of YouTube to retrieve the content he/she is really interested in unless some help is provided.

Similar difﬁculties arise with all types of multimedia data, such as audio, image, video, animation, graphics, and text. Thus, innovative methods to address the problem of extensive and complex data are expected to prove useful in many and diverse data management applications.

In order to reduce the risk of information overload of users, recommender system research and development aims at providing ways of individualizing the content returned to a user via attempts to understand the user’s needs and interests.

Speciﬁc recommender systems have proven useful in assisting users in selecting books, music, movies, clothes, and content of various other forms.

At the core of recommender systems lie machine learning algorithms, which monitor the actions of a recommender system user and learn about his/her needs and interests. The fundamental idea is that a user provides directly or indirectly examples of content he/she likes (“positive examples”) and examples of content he/

she dislikes (“negative examples”) and the machine learning module seeks and recommends content “similar” to what the user likes and avoids recommending content“similar”to what the user dislikes. This idea sounds intuitively correct and has, indeed, led to useful recommender systems. Unfortunately, users may be willing to provide examples of content they like, but are very hesitant when asked to provide examples of content they dislike. Recommender systems built on the assumption of availability of both positive and negative examples do not perform well when negative examples are rare.

vii

(8)

It is exactly this problem that the authors have tackled in their book. They collect results from their own recently-published research and propose an innovative approach to designing recommender systems in which only positive examples are made available by the user. Their approach is based on one-class classiﬁcation methodologies in recent machine learning research.

The blending of recommender systems and one-class classiﬁcation seems to be providing a new very fertile ﬁeld for research, innovation, and development.

I believe the authors have done a good job addressing the book topic. I consider the book at hand particularly timely and expect that it will prove very useful to researchers, practitioners, and graduate students dealing with problems of extensive and complex data.

March 2015 Dumitru Dan Burdescu

Professor, Eng., Math., Ph.D.

Head of Software Engineering Department, Director of

“Multimedia Application Development”Research Centre Faculty of Automation, Computers and Electronics University of Craiova, Craiova, Romania

viii Foreword

(9)

Recent advances in electronic media and computer networks have allowed the creation of large and distributed repositories of information. However, the immediate availability of extensive resources for use by broad classes of computer users gives rise to new challenges in everyday life. These challenges arise from the fact that users cannot exploit available resources effectively when the amount of information requires prohibitively long user time spent on acquaintance with and comprehension of the information content. Thus, the risk of information overload of users imposes new requirements on the software systems that handle the information. Such systems are called Recommender Systems (RS) and attempt to provide information in a way that will be most appropriate and valuable to its users and prevent them from being overwhelmed by huge amounts of information that, in the absence of RS, they should browse or examine.

In this monograph,first, we explore the use of objective content-based features to model the individualized (subjective) perception of similarity between multimedia data. We present a content-based RS which constructs music similarity perception models of its users by associating different similarity measures to different users. The results of the evaluation of the system verify the relation between subsets of objective features and individualized (music) similarity perception and exhibit significant improvement in individualized perceived similarity in subsequent recommended items. The investigation of these relations between objective feature subsets and user perception offer an indirect explanation and justification for the items one selects. The users are clustered according to specific subsets of features that reflect different aspects of the music signal. This assignment of a user to a specific subset of features allows us to formulate indirect relations between his/her perception and corresponding item similarity (e.g., music similarity) that involve his/her preferences. Consequently, the selection of a specific feature subset can provide a justification/reasoning of the various factors that influence the user's perception of similarity to his/her preferences.

Secondly, we address the recommendation process as a hybrid combination of one-class classification with collaborativefiltering. Specifically, we follow a cascade scheme in which the recommendation process is decomposed into two levels.

ix

(10)

In thefirst level, our approach attempts to identify for each user only the desirable items from the large amount of all possible items, taking into account only a small portion of his/her available preferences. Toward this goal, we apply a one-class classification scheme, in the training stage of which only positive examples (desirable items for which users have expressed an opinion-rating value) are required. This is very important, as it is sensibly hard in terms of time and effort for users to explicitly express what they consider as non-desirable to them. In the second level, either a content-based or a collaborativefiltering approach is applied to assign a corresponding rating degree to these items. Our cascade schemefirst builds a user profile by taking into consideration a small amount of his/her preferences and then selects possible desirable items according to these preferences which are refined and into a rating scale in the second level. In this way, the cascade hybrid RS avoids known problems of content-based or collaborativefiltering RS.

The fundamental idea behind our cascade hybrid recommendation approach is to mimic the social recommendation process in which someone has already identified some items according to his/her preferences and seeks the opinions of others about these items, so as to make the best selection of items that fall within his/her individual preferences. Experimental results reveal that our hybrid recommendation approach outperforms both a pure content-based approach or a pure collaborative filtering technique. Experimental results from the comparison between the pure collaborative and the cascade content-based approaches demonstrate the efficiency of thefirst level. On the other hand, the comparison between the cascade content- based and the cascade hybrid approaches demonstrates the efficiency of the second level and justifies the use of the collaborativefiltering method in the second level.

Piraeus, Greece Aristomenis S. Lampropoulos

March 2015 George A. Tsihrintzis

x Preface

(11)

We would like to thank Prof. Dr. Lakhmi C. Jain for agreeing to include this monograph in the Intelligent Systems Reference Library (ISRL) book series of Springer that he edits. We would also like to thank Prof. Dumitru Dan Burdescu of the University of Craiova, Romania, for writing a foreword to the monograph.

Finally, we would like to thank the Springer staff for their excellent work in typesetting and publishing this monograph.

xi

(12)

Introduction

Abstract Recent advances in electronic media and computer networks have allowed the creation of large and distributed repositories of information. However, the immediate availability of extensive resources for use by broad classes of computer users gives rise to new challenges in everyday life. These challenges arise from the fact that users cannot exploit available resources effectively when the amount of information requires prohibitively long user time spent on acquaintance with and comprehension of the information content. Thus, the risk of information overload of users imposes new requirements on the software systems that handle the information. One of these requirements is the incorporation into the software systems of mechanisms that help their users when they face difficulties during human-computer interaction sessions or lack the knowledge to make decisions by themselves. Such mechanisms attempt to identify user information needs and to personalize human-computer interactions.

(Personalized) Recommender Systems (RS) provide an example of software systems that attempt to address some of the problems caused by information overload. This chapter provides an introduction to Recommender Systems.

1.1 Introduction to Recommender Systems

RS are defined in [16] as software systems in which “people provide recommendations as inputs, which the system then aggregates and directs to appropriate recipi- ents.” Today, the term includes a wider spectrum of systems describing any system that provides individualization of the recommendation results and leads to a procedure that helps users in a personalized way to interesting or useful objects in a large space of possible options. RS form an important research area because of the abundance of their potential practical applications.

Clearly, the functionality of RS is similar to the social process of recommendation and reduction of information that is useless or uninteresting to the user. Thus, one might consider RS as similar to search engines or information retrieval systems.

However, RS are to be differentiated from search engines or information retrieval systems as a RS not only finds results, but additionally uses its embedded individualization and personalization mechanisms to select objects (items) that satisfy the

A.S. Lampropoulos and G.A. Tsihrintzis,Machine Learning Paradigms, Intelligent Systems Reference Library 92, DOI 10.1007/978-3-319-19135-5_1

1

(16)

2 1 Introduction specific querying user needs. Thus, unlike search engines or information retrieval systems, a RS provides information in a way that will be most appropriate and valuable to its users and prevents them from being overwhelmed by huge amounts of information that, in the absence of RS, they should browse or examine. This is to be contrasted with the target of a search engine or an information retrieval system which is to “match” items to the user query. This means that a search engine or an information retrieval system tries to form and returna ranked listof all those items that match the query. Techniques of active learning such asrelevance-feedbackmay give these systems the ability to refine their results according to the user preferences and, thus, provide a simple form of recommendation. More complex search engines such asGOOGLEutilize other kinds of criteria such as “authoritativeness”, which aim at returning as many useful results as possible, butnotin an individualized way.

A learning-based RS typically works as follows: (1) the recommender system collects all given recommendations at one place and (2) applies a learning algorithm, thereafter. Predictions are then made either with a model learnt from the dataset (model-based predictions) using, for example, a clustering algorithm [3, 18] or on the fly (memory-based predictions) using, for example, a nearest neighbor algorithm [3,15]. A typical prediction can be a list of the top-Nrecommendations or a requested prediction for a single item [7].

Memory-based methods store training instances during training which are can be retrieved when making predictions. In contrast, model-based methods generalize into a model from the training instances during training and the model needs to be updated regularly. Then, the model is used to make predictions. Memory-based methods learn fast but make slow predictions, while model-based methods make fast predictions but learn slowly.

The roots of RS can be traced back to Malone et al. [11], who proposed three forms of filtering: cognitive filtering (now called content-based filtering), social filtering (now called collaborative filtering (CF)) and economic filtering. They also suggested that the best approach was probably to combine these approaches into the category of, so-called,hybridRS.

1.2 Formulation of the Recommendation Problem

In general, the recommendation problem is defined as the problem of estimating ratings for the items that have not been seen by a user. This estimation is based on:

• ratings given by the user to other items,

• ratings given to an item by other users,

• and other user and item information (e.g. item characteristics, user demographics).

The recommendation problem can be formulated [1] as follows:

LetU be the set of all users U = {u1,u2, ...,um}and let I be theset of all possible items I = {i1,i2, ...,in}that can be recommended, such as music files, images, movies, etc. The spaceI of possible items can be very large.

(17)

Let f be a utility function that measures the usefulness of item i to user u,

f :U×I →R, (1.1)

where Ris a totally ordered set (e.g. the set of nonnegative integers or real numbers within a certain range). Then, for each useru∈U, we want to choose an itemi∈ I that maximizes the user utility function, i.e.

∀u∈U,i_u =arg max

i∈I f(u,i). (1.2)

In RS, the utility of an item is usually represented by a rating, which indicates how a particular user liked a particular item, e.g., useru1gave the objecti1the rating of R(1,1)=3, where R(u,i)∈ {1,2,3,4,5}.

Each useruk, wherek=1,2, ...,m, has a list of items Iu_k about which the user has expressed his/her preferences. It is important to note that Iu_k ⊆ I, while it is also possible forIu_k to be the null set. This latter means that users are not required to express their preferences for all existing items.

Each element of the user spaceU can be defined with a profile that includes various user characteristics, such as age, gender, income, marital status, etc. In the simplest case, the profile can contain only a single (unique) element, such as User ID.

Recommendation algorithms enhance various techniques by operating

• either onrowsof the matrixR, which correspond to ratings of a single user about different items,

• or oncolumnsof the matrixR, which correspond to different users’ ratings for a single item.

However, in general, the utility function can be an arbitrary function, including a profit function. Depending on the application, a utility f can either be specified by the user, as is often done for the user-defined ratings, or computed by the application, as can be the case for a profit-based utility function. Each element of the user space U can be defined with a profile that includes various user characteristics, such as age, gender, income, marital status, etc. In the simplest case, the profile can contain only a single (unique) element, such as User ID.

Similarly, each element of the item spaceI is defined via a set of characteristics.

The central problem of RS lies in that a utility function f is usually not defined on the entireU ×I space, but only on some subset of it. This means that f needs to begeneralizedto the entire spaceU×I. In RS, a utility is typically represented by ratings and is initially defined only on the items previously rated by the users.

Generalizationsfrom known to unknown ratings are usually done by:

• specifying heuristics that define the utility function and empirically validating its performance, or

• estimating the utility function that optimizes a certain performance criterion, such as Mean Absolute Error (MAE).

(18)

4 1 Introduction Once the unknown ratings are estimated, actual recommendations of an item to a user are made by selecting the highest rating among all the estimated ratings for that user, according to Eq.1.2. Alternatively, we can recommend theNbest items to a user. Additionally, we can recommend a set of users to an item.

1.2.1 The Input to a Recommender System

The input to a RS depends on the type of the filtering algorithm employed. The input belongs to one of the following categories:

1. Ratings (also called votes), which express the opinion of users on items. Ratings are normally provided by the user and follow a specified numerical scale (example:

1-bad to 5-excellent). A common rating scheme is the binary rating scheme, which allows only ratings of either 0 or 1. Ratings can also be gathered implicitly from the users purchase history, web logs, hyper-link visits, browsing habits or other types of information access patterns.

2. Demographic data, which refer to information such as the age, the gender and the education of the users. This kind of data is usually difficult to obtain. It is normally collected explicitly from the user.

3. Content data, which are based on content analysis of items rated by the user. The features extracted via this analysis are used as input to the filtering algorithm in order to infer a user profile.

1.2.2 The Output of a Recommender System

The output of a RS can be either apredictionor arecommendation.

• Apredictionis expressed as a numerical value,Ra,j =R(ua,ij), which represents the anticipated opinion of active useruafor itemij. This predicted value should necessarily be within the same numerical scale (example: 1-bad to 5-excellent) as the input referring to the opinions provided initially by active userua. This form of RS output is also known asIndividual Scoring.

• Arecommendationis expressed as a list ofNitems, whereN ≤n, which the active user is expected to like the most. The usual approach in that case requires this list to include only items that the active user has not already purchased, viewed or rated.

This form of RS output is also known asTop-N Recommendation or Ranked Scoring.

(19)

1.3 Methods of Collecting Knowledge About User Preferences

To generate personalized recommendations that are tailored to the specific needs of the active user, RS collect ratings of items by users and build user-profiles in ways that depend on the methods that the RS utilize to collect personal information about user preferences. In general, these methods are categorized into three approaches:

• anImplicit approach, which is based on recording user behavior,

• anExplicit approach, which is based on user interrogation,

• aMixing approach, which is a combination of the previous two.

1.3.1 The Implicit Approach

This approach does not require active user involvement in the knowledge acquisition task, but, instead, the user behavior is recorded and, specifically, the way that he/she reacts to each incoming piece of data. The goal is to learn from the user reaction about the relevance of the data item to the user. Typical examples for implicit ratings are purchase data or reading time of Usenet news [15]. In the CF system in [9], they monitored reading times as an indicator for relevance. This revealed a relationship between time spent on reviewing data items and their relevance. In [6], the system learns the user profile by passively observing the hyperlinks clicked on and those passed over and by measuring user mouse and scrolling activity in addition to user browsing activity. Also, in [14] they utilize agents that operate as adaptive Web site RS. Through analysis of Web logs and web page structure, the agents infer knowledge of the popularity of various documents as well as a combination of document similarity. By tracking user actions and his/her acceptance of the agent recommendations, the agent can make further estimations about future recommendations to the specific user. The main benefits of implicit feedback over explicit ratings are that they remove the cognitive cost of providing relevance judgements explicitly and can be gathered in large quantities and aggregated to infer item relevance [8].

However, the implicit approach bears some serious implications. For instance, some purchases are gifts and, thus, do not reflect the active user interests. More- over, the inference that purchasing implies liking does not always hold. Owing to the difficulty of acquiring explicit ratings, some providers of product recommendation services adopt bilateral approaches. For instance, Amazon.com computes recommendations based on explicit ratings whenever possible. In case of unavailability, observed implicit ratings are used instead.

(20)

6 1 Introduction

1.3.2 The Explicit Approach

Users are required to explicitly specify their preference for any particular item, usually by indicating their extent of appreciation on 5-point or 7-pointThurstone scales.

These scales are mapped to numeric values, e.g. Ri,j ∈ [1,2,3,4,5]. Lower values commonly indicate least favorable preferences, while higher values express the user’s liking.¹Explicit ratings impose additional efforts on users. Consequently, users often tend to avoid the burden of explicitly stating their preferences and either leave the system or rely upon “free-riding” [2]. Ratings made on these scales allow these judgments to be processed statistically to provide averages, ranges, or distributions.

A central feature of explicit ratings is that the user who evaluates items has to examine them and, then, to assign to them values from the rating scale. This imposes a cognitive cost on the evaluator to assess the performance of an object [12].

1.3.3 The Mixing Approach

Newsweeder [10], a Usenet filtering system, is an example of a system that uses a combination of the explicit and the implicit approach, as it requires minimum user involvement. In this system, the users are required to rate documents for their relevance. The ratings are used as training examples for a machine learning algorithm that is executed nightly to generate user interest profiles for the next day. Newsweeder is successful in reducing user involvement. However, the batch profiling used in Newsweeder is a shortcoming as profile adaptation is delayed significantly.

1.4 Motivation of the Book

The motivation of this book is based on the following facts that constitute important open research problems in RS. It is well known that users hardly provide explicit feedbacks in RS. More specifically, users tend to provide ratings only for items that they are interested in and belong to their preferences and avoid, to provide feedback in the form of negative examples, i.e. items that they dislike or they are not interested in. As stated in [5, 17], “It has been known for long time in human computer interaction that users are extremely reluctant to perform actions that are not directed towards their immediate goal if they do not receive immediate benefits”.

However, common RS based on machine learning approaches use classifiers that, in order to learn user interests, require both positive (desired items that users prefer) and

1The Thurstone scale was used in psychology for measuring an attitude. It was developed by Louis Leon Thurstone in 1928, as a means of measuring attitudes towards religion. It is made up of statements about a particular issue. A numerical value is associated with each statement, indicating how favorable or unfavorable the statement is judged to be.

(21)

negative examples (items that users dislike or are not interested in). Additionally, the effort for collecting negative examples is arduous as these examples should uniformly represent the entire set of items, excluding the class of positive items. Manually collecting negative samples could be biased and require additional effort by users.

Moreover, especially in web applications, users consider it very difficult to provide personal data and rather avoid to be related with internet sites due to lack of faith in the privacy of modern web sites [5, 17]. Therefore, RS based on demographic data or stereotypes that resulted from such data are very limited since there is a high probability that the user-supplied information suffers from noise induced by the fact that users usually give fake information in many of these applications.

Thus, machine learning methods need to be used in RS, that utilize only positive examples provided by users without additional information either in the form of negative examples or in the form of personal information for them. PEBL [19] is an example of a RS to which only positive examples are supplied by its users. Specifi- cally, PEBL is a web page classification approach that works within the framework of learning based only on positive examples and uses the mapping-convergence algorithm combined with SVM.

On the other hand, user profiles can be either explicitly obtained from user ratings or implicitly learnt from the recorded user interaction data (i.e. user play-lists). In the literature, collaborative filtering based on explicit ratings has been widely studied while binary collaborative filtering based on user interaction data has been only partially investigated. Moreover, most of the binary collaborative filtering algorithms treat the items that users have not yet played/watched as the “un-interested in” items (negative class), which, however, is a practically invalid assumption.

Collaborative filtering methods assume availability of a range of high and low ratings or multiple classes in the data matrix of Users-Items. One-class collaborative filtering proposed in [13] provides weighting and sampling schemes to handle one-class settings with unconstrained factorizations based on the squared loss. Essen- tially, the idea is to treat all non-positive user-item pairs as negative examples, but appropriately control their contribution in the objective function via either uniform, user-specific or item-specific weights.

Thereby, we must take into consideration that the recommendation process could not only be expanded in a classification scheme about users’ preferences as in [19], but should also take into account the opinion of other users in order to eliminate the problem of “local optima” of the content-based approaches [5,17]. On the other hand, pure collaborative approaches have the main drawback that they tend to recommend items that could possibly be biased by a group of users and to ignore information that could be directly related to item content and a specific user’s preferences. Thus, an approach is required that pays particular attention to the above matters.

Most of the existing recommendation methods have as a goal to provide accurate recommendations. However, an important factor for a RS is its ability to adapt according to user perception and to provide a kind of justification to a recommendation which allow its recommendations to be accepted and trusted by users. Rec- ommendations based only on ratings, without taking into account the content of the recommended items fail to provide qualitative justifications. As stated in [4], “when

(22)

8 1 Introduction the users can understand the strengths and limitations of a RS, the acceptance of its recommendations is increased.” Thus, new methods are needed that make enhanced use of similarity measures to provide both individualization and an indirect way for justifications for the items that are recommended to the users.

1.5 Contribution of the Book

The contribution of this book is two-fold. The first contribution develops, presents and evaluates a content-based RS based on multiple similarity measures that attempt to capture userperceptionof similarity and to provide individualization and justifications of recommended items according to the similarity measure that was assigned to each user. Specifically, a content-based RS, called MUSIPER,²is presented which constructs music similarity perception models of its users by associating different similarity measures with different users. Specifically, a user-supplied relevance feedback procedure and related neural network-based incremental learning allow the system to determine which subset of a full set of objective features approximates more accurately the subjective music similarity perception of a specific user. Our implementation and evaluation of MUSIPER verifies the relation between subsets of objective features and individualized music similarity perception and exhibits significant improvement in individualized perceived similarity in subsequent recommended items. Additionally, the investigation of the relation between objective feature subsets and user perception offers an explanation and justification for the items one selects.

The selection of the objective feature subsets in MUSIPER was based on semantic categorization of the features in a way that formed groups of features that reflect semantically different aspects of the music signal. This semantic categorization helped us to formulate indirect relations between a user’s specific perception and corresponding item similarity (in this case, music similarity) that involves his/her preferences. Thus, the selected features in a specific feature subset provides a justification-reasoning for the factors that influence the specific user’s perception of similarity between objects and, consequently, for his/her preferences. As it was observed,nosingle feature subset outperformed the other subsets for all uses. More- over, it was experimentally observed that the users of MUSIPER were clustered by the eleven feature subsets in MUSIPER into eleven corresponding clusters. It was also observed that, in this clustering scheme, empty user clusters appeared, which implies that the corresponding feature subsets failed to model the music similarity perception of any user at all. On the other hand, there were other feature subsets the corresponding clusters of which contained approximately 27 and 18 % of the users of MUSIPER. These two findings are indicative of the effect of qualitative differences of the corresponding feature subsets. They provide strong evidence justifying our initial hypothesis that relates feature subsets with the similarity perception of an individual.

2MUSIPER is an acronym that stands for MUsic SImilarity PERception.

(23)

Additionally, they indicate that users tend to concentrate around particular factors (features) that eventually influence their perception of item similarity and corresponding item preferences.

The second contribution of this book concerns the development and evaluation of a hybrid cascade RS that utilizes only positive examples from a user Specifically, a content-based RS is combined with collaborative filtering techniques in order pri- marily to predict ratings and secondly to exploit the content-based component to improve the quality of recommendations. Our approach focuses on:

1. using only positive examples provided by each user and

2. avoiding the “local optima” of the content-based RS component that tends to recommend only items that a specific user has already seen without allowing him/her to view the full spectrum of items. Thereby, a need arises for enhancement of collaborative filtering techniques that combine interests of users that are comparable to the specific user.

Thus, we decompose the recommendation problem into a two-level cascaded recommendation scheme. In the first level, we formulate a one-class classification problem based on content-based features of items in order to model the individualized (subjective) user preferences into the recommendation process. In the second level, we apply either a content-based approach or a collaborative filtering technique to assign a corresponding rating degree to these items. Our realization and evaluation of the proposed cascade hybrid recommender approach demonstrates its efficiency clearly. Our recommendation approach benefits from both content-based and collaborative filtering methodologies. The content-based level eliminates the drawbacks of the pure collaborative filtering that do not take into account the subjective preferences of an individual user, as they are biased towards the items that are most preferred by the remaining users. On the other hand, the collaborative filtering level eliminates the drawbacks of the pure content-based recommender which ignores any benefi- cial information related to users with similar preferences. The combination of the two approaches into a cascade form mimics the social process where someone has selected some items according to his/her preferences and, to make a better selection, seeks opinions about these from others.

1.6 Outline of the Book

The book is organized as follows:

In Chap.2, related works are presented on approaches to address fundamental problems of RS. In Chap.3, the general problem and key definitions, paradigms, and results are presented of the scientific discipline of learning, with particular empha- sis on machine learning. More specifically, we focus on statistical learning and the two main paradigms that have developed in statistical inference: the parametric paradigm and the general non-parametric paradigm. We concentrate our analysis on classification problems solved with the use of Support Vector Machines (SVM) as

(24)

10 1 Introduction applicable to our recommendation approaches. Particularly, we summarize the One- Class Classification approach and the application of One-Class SVM Classification to the recommendation problem.

Next, Chap.4presents features that are utilized to analyze the content of multimedia data. Specifically, we present the MPEG-7 framework which forms a widely adopted standard for processing multimedia files. Additionally, we present the MARSYAS framework for extraction of features from audio files.

In Chap.5, the content-based RS, called MUSIPER, is presented and analyzed.

MUSIPER uses multiple similarity measures in order to capture the perception of similarity of different users and to provide individualization and justifications for items recommended according to the similarity measure assigned to each user.

In the following two Chaps.6 and7, we present our cascade recommendation methods based on a two-level combination of one-class SVM classifiers with collaborative filtering techniques.

Finally, we summarize the book, draw conclusions and point to future related research work in Chap.8.

References

1. Adomavicius, G., Tuzhilin, E.: Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions. IEEE Trans. Knowl. Data Eng.17, 734–749 (2005)

2. Avery, C., Zeckhauser, R.: Recommender systems for evaluating computer messages. Commun.

ACM40(3), 88–89 (1997). doi:10.1145/245108.245127

3. Breese, J.S., Heckerman, D., Kadie, C.: Empirical analysis of predictive algorithms for collaborative filtering. In: Proceedings of Fourteenth Conference on Uncertainty in Artificial Intelligence, pp. 43–52. Morgan Kaufmann (1998)

4. Herlocker, J.L., Konstan, J.A., Riedl, J.: Explaining collaborative filtering recommendations.

In: Proceedings of the 2000 ACM Conference on ComputeR Supported Cooperative Work CSCW’00, pp. 241–250. ACM, New York (2000). doi:10.1145/358916.358995

5. Ingo, S., Alfred, K., Ivan, K.: Learning user interests through positive examples using content analysis and collaborative filtering (2001).http://citeseer.ist.psu.edu/schwab01learning.html 6. Jude, J.G., Shavlik, J.: Learning users’ interests by unobtrusively observing their normal behav-

ior. In: Proceedings of International Conference on Intelligent User Interfaces, pp. 129–132.

ACM Press (2000)

7. Karypis, G.: Evaluation of item-based top-n recommendation algorithms. In: Proceedings of the Tenth International Conference on Information and Knowledge Management CIKM’01, pp. 247–254. ACM, New York (2001). doi:10.1145/502585.502627

8. Kelly, D., Teevan, J.: Implicit feedback for inferring user preference: a bibliography. SIGIR Forum37(2), 18–28 (2003). doi:10.1145/959258.959260

9. Konstan, J.A., Miller, B.N., Maltz, D., Herlocker, J.L., Gordon, L.R., Riedl, J.: GroupLens:

applying collaborative filtering to usenet news. Commun. ACM40(3), 77–87 (1997) 10. Lang, K.: Newsweeder: learning to filter netnews. In: Proceedings of 12th International Machine

Learning Conference (ML95), pp. 331–339 (1995)

11. Malone, T.W., Grant, K.R., Turbak, F.A., Brobst, S.A., Cohen, M.D.: Intelligent information- sharing systems. Commun. ACM30(5), 390–402 (1987). doi:10.1145/22899.22903 12. Nichols, D.M.: Implicit rating and filtering. In: Proceedings of the Fifth DELOS Workshop on

Filtering and Collaborative Filtering, pp. 31–36 (1997)

(25)

13. Pan, R., Zhou, Y., Cao, B., Liu, N.N., Lukose, R., Scholz, M., Yang, Q.: One-class collaborative filtering. In: Proceedings of the 2008 Eighth IEEE International Conference on Data Mining ICDM’08, pp. 502–511. IEEE Computer Society, Washington (2008). doi:10.1109/ICDM.

2008.16

14. Pazzani, M.J.: A framework for collaborative, content-based and demographic filtering. Artif.

Intell. Rev.13(5–6), 393–408 (1999). doi:10.1023/A:1006544522159

15. Resnick, P., Iacovou, N., Suchak, M., Bergstrom, P., Riedl, J.: GroupLens: an open architecture for collaborative filtering of netnews. In: Proceedings of Computer Supported Collaborative Work Conference, pp. 175–186. ACM Press (1994)

16. Resnick, P., Varian, H.R.: Recommender systems. Commun. ACM40(3), 56–57 (1997) 17. Schwab, I., Pohl, W., Koychev, I.: Learning to recommend from positive evidence. In: Proceed-

ings of the 5th International Conference on Intelligent User Interfaces IUI ’00, pp. 241–247.

ACM, New York (2000). doi:10.1145/325737.325858

18. Ungar, L., Foster, D., Andre, E., Wars, S., Wars, F.S., Wars, D.S., Whispers, J.H.: Clustering methods for collaborative filtering. In: Proceedings of AAAI Workshop on Recommendation Systems. AAAI Press (1998)

19. Yu, H., Han, J., Chang, K.C.C.: PEbL: web page classification without negative examples.

IEEE Trans. Knowl. Data Eng.16(1), 70–81 (2004). doi:10.1109/TKDE.2004.1264823

(26)

Chapter 2 Review of Previous Work Related to Recommender Systems

Abstract The large amount of information resources that are available to users imposes new requirements on the software systems that handle the information.

This chapter reviews the state of the art of the main approaches to designing RSs that address the problems caused by information overload. In general, the methods implemented in a RS fall within one of the following categories: (a) Content-based Methods, (b) Collaborative Methods and (c) Hybrid Methods.

2.1 Content-Based Methods

Modern information systems embed the ability to monitor and analyze users’ actions to determine the best way to interact with them. Ideally, each users actions are logged separately and analyzed to generate an individual user profile. All the information about a user, extracted either by monitoring user actions or by examining the objects the user has evaluated [9], is stored and utilized to customize services offered. This user modeling approach is known ascontent-based learning. The main assumption behind it is that a user’s behavior remains unchanged through time; therefore, the content of past user actions may be used to predict the desired content of their future actions [4, 27]. Therefore, in content-based recommendation methods, the rating R(u,i)of the itemi for the useruis typically estimated based on ratings assigned by useruto the itemsIn∈ I that are “similar” to itemiin terms of their content, as defined by their associated features.

To be able to search through a collection of items and make observations about the similarity between objects that are not directly comparable, we must transform raw data at a certain level of information granularity. Information granules refer to a collection of data that contain only essential information. Such granulation allows more efficient processing for extracting features and computing numerical representations that characterize an item. As a result, the large amount of detailed information of one item is reduced to a limited set of features. Each feature is a vector of low dimensionality, which captures some aspects of the item and can be used to determine item similarity. Therefore, an itemicould be described by a feature vector F(i)= [feature₁(i),feature₂(i),feature₃(i), . . .feature_n(i)]. (2.1)

A.S. Lampropoulos and G.A. Tsihrintzis,Machine Learning Paradigms, Intelligent Systems Reference Library 92, DOI 10.1007/978-3-319-19135-5_2

13

(27)

For example, in a music recommendation application, in order to recommend music files to useru, the content-based RS attempts to build a profile of the user’s preferences based on features presented in music files that the useruhas rated with high rating degrees. Consequently, only music files that have a high degree of similarity with these highly rated files would be recommended to the user. This method is known as “item-to-item correlation” [41]. The type of user profile derived by a content-based RS depends on the learning method which is utilized by the system.

This approach to the recommendation process has its roots in information retrieval and information filtering [3,36]. Retrieval-based approaches utilize interactive learning techniques such asrelevance feedbackmethods, in order to organize and retrieve data in an effective personalized way. In relevance feedback methods, the user is part of the item-management process, which means that the user evaluates the results provided by the system. Then, the system adapts, its performance according to the user’s preferences. In this way, the method of relevance feedback has the efficiency not only to take into account the user subjectivity in perceiving the content of items, but also to eliminate the gap between high-level semantics and low-level features which are usually used for the content description of items [12,13,35].

Besides the heuristics that are based mostly on information retrieval methods [3, 12,13, 35, 36] such as the Rocchio algorithm or correlation-based schemes, other techniques for content-based recommendation utilize Pattern Recognition/Machine Learning approaches, such as Bayesian classifiers [28], clustering methods, decision trees, and artificial neural networks.

These techniques differ from information retrieval-based approaches as they calculate utility predictions based not on a heuristic formula, such as a cosine similarity measure, but rather are based on a model learnt from the underlying data using statistical and machine learning techniques. For example, based on a set of Web pages that were rated by the user as “relevant” or “irrelevant,” the naive Bayesian classifier is used in [28] to classify unrated Web pages.

Some examples of content-based methods come from the area of music data. In [10, 19,24,25, 47], they recommend pieces that are similar to users’ favorites in terms of music content such as mood and rhythm. This allows a rich artist variety and various pieces, including unrated ones, to be recommended. To achieve this, it is necessary to associate user preferences with music content by using a practical database where most users tend to rate few pieces as favorites.

A relevance feedback approach for music recommendation was presented in [19]

and based on the TreeQ vector quantization process initially proposed by Foote [14]. More specifically, relevance feedback was incorporated into theuser modelby modifying the quantization weights of desired vectors. Also, a relevance feedback music retrieval system, based onSVMActive Learning, was presented in [25], which retrieves the desired music piece according to mood and style similarity.

In [2], the authors explore the relation between the user’s rating input, musical pieces with high degree of rating that were defined as the listener’s favorite music, and music features. Specifically, labeled music pieces from specific artists were analyzed in order to build a correlation between user ratings and artists through music features.

Their system forms the user profile as preference for music pieces of a specific artist.

(28)

2.1 Content-Based Methods 15 They confirmed that favorite music pieces were concentrated along certain music features.

The system in [52] proposes the development of a user-driven similarity function by combining timbre-, tempo-, genre-, mood-, and year-related features into the overall similarity function. More specifically, similarity is based on a weighted combination of these features and the end-user can specify his/her personal definition of similarity by weighting them.

The work in [15] tries to extend the use of signal approximation and characteriza- tion from genre classification to recognition of user taste. The idea is to learn music preferences by applying instance-based classifiers to user profiles. In other words, this system doesnot build an individual profile for every user, but instead tries to recognize his/her favorite genre by applying instance-based classifiers to user rating preferences by his/her music playlist.

2.2 Collaborative Methods

CF methods are based on the assumption that similar users prefer similar items or that a user expresses similar preferences for similar items. Instead of performing content indexing or content analysis, CF systems rely entirely on interest ratings from the members of a participating community [18]. CF methods are categorized into two general classes, namelymodel-basedandmemory-based[1,7].

Model-based algorithms use the underlying data to learn a probabilistic model, such as a cluster model or a Bayesian network model [7,53], using statistical and machine learning techniques. Subsequently, they use the model to make predictions.

The clustering model [5,51] works by clustering similar users in the same class and estimating the probability that a particular user is in a particular class. From there, the clustering model computes the conditional probability of ratings.

Memory-based methods, store raw preference information in computer memory and access it as needed to find similar users or items and to make predictions. In [29], CF was formulated as a classification problem. Specifically, based on a set of user ratings about items, they try to induce a model for each user that would allow the classification of unseen items into two or more classes, each of which corresponds to different points in the accepted rating scale.

Memory-based CF methods can be further divided into two groups, namely user- based and item-based [37] methods. On the one hand, user-based methods look for users (also called “neighbors”) similar to the active user and calculate a predicted rating as a weighted average of the neighbor’s ratings on the desired item. On the other hand, item-based methods look for similar items for an active user.

2.2.1 User-Based Collaborative Filtering Systems

User-based CF systems are systems that utilizememory-based algorithms, meaning that they operate over the entire user-item matrixR, to make predictions. The majority

(29)

of such systems mainly deal withuser-user similarity calculations, meaning that they utilize user neighborhoods, constructed as collections of similar users. In other words, they deal with the rows of the user-item matrix, R, in order to generate their results. For example, in a personalized music RS called RINGO [43], similarities between the tastes of different users are utilized to recommend music items. This user-based CF approach works as follows: A new user is matched against the database to discover neighbors, which are other customers who, in the past, have had a similar taste as the new user, i.e. who have bought similar items as the new user. Items (unknown to the new user) that these neighbors like are then recommended to the new user. The main steps of this process are:

1. Representation of Input data, 2. Neighborhood Formation, and 3. Recommendation Generation.

2.2.1.1 Representation of Input Data

To represent input data, one needs to define a set of ratings of users into a user-item matrix, R, where each R(u,i)represents the rating value assigned by the user u to the itemi. As users are not obligated to provide their opinion for all items, the resulting user-item matrix may be a sparse matrix. This sparsity of the user-item matrix is the main reason causing filtering algorithms not to produce satisfactory results. Therefore, a number of techniques were proposed to reduce the sparsity of the initial user-item matrix to improve the efficiency of the RS. Default Votingis the simplest technique used to reduce sparsity. A default rating value is inserted to items for which there does not exist a rating value. This rating value is selected to be neutral or somewhat indicative of negative preferences for unseen items [7].

An extension of the method of Default Voting is to use either theUser Average Schemeor theItem Average Schemeor theComposite Scheme[39]. More specifically:

• In theUser Average Scheme, for each user,u, the average user rating over all the items is computed, R(u). This is expressed as the average of the corresponding row in the user-item matrix. The user average is then used to replace any missing R(u,i)value. This approach is based on the idea that a user’s rating for a new item could be simply predicted if we take into account the same user’s past ratings.

• In the Item Average Scheme, for each item, the item average over all users is computed,R(i). This is expressed as the average of the corresponding column in the user-item matrix. The item average is then used as a fill-in for missing values R(u,i)in the matrix.

• In theComposite Scheme, the collected information for items and users both con- tribute to the final result. The main idea behind this method is to use the average of useruon itemi as a base prediction and then add a correction term to it based on how the specific item was rated by other users.

(30)

2.2 Collaborative Methods 17 The scheme works as follows: When a missing entry regarding the rating of user uon itemi is located, initially, the user averageR(u)is calculated as the average of the corresponding user-item matrix row. Then, we search for existing ratings in the column which correspond to itemi. Assuming that a set ofl users,U = {u1,u2, . . . ,ul}, has provided a rating for itemi, we can compute a correction term for each useru ∈ L equal toδk = R(uk,i)−R(uk). After the corrections for all users inUare computed, the composite rating can be calculated as:

R(u,i)=

⎧⎨

⎩R(u)+

l k=1δk

l , if useruhas not rated itemi R, if useruhas rated itemiwithR.

(2.2)

An alternative way of utilizing the composite scheme is through a simple trans- position: first compute the item average,R(ik), (i.e., average of the column which corresponds to itemi) and then compute the correction terms, δk, by scanning through alllitemsI = {i1,i2, . . . ,il}rated by userk. The fill-in value ofR(u,i) would then be:

R(u,i)=R(i)+ l k=1

δk

l , (2.3)

wherelis the count of items rated by useruand the correction terms are computed for all items inI asδk =R(u,ik)−R(ik)

After generating a reduced-dimensionality matrix, we could use a vector similarity metric to compute the proximity between users and hence to formneighborhoods of users[38], as discussed in the following.

2.2.1.2 Neighborhood Formation

In this step of the recommendation process, thesimilaritybetween users is calculated in the user-item matrix,R, i.e., users similar to the active user,ua, form a proximity- based neighborhood with him. More specifically, neighborhood formation is implemented in two steps: Initially, the similarity between all the users in the user-item matrix,R, is calculated with the help of some proximity metrics. The second step is the actual neighborhood generation for the active user, where the similarities of users are processed in order to select those users that will constitute the neighborhood of the active user. To find the similarity between users ua andub, we can utilize the Pearson correlation metric. The Pearson correlation was initially introduced in the context of the GroupLens project [33,43], as follows: Let us assume that a set of musersuk, wherek=1,2, . . . ,m,Um = {u1,u2, . . . ,um}, have provided a rating R(uk,il)for itemil, wherel =1,2, . . . ,n,In= {i1,i2, . . . ,in}is the set of items.

The Pearson correlation coefficient is given by:

(31)

sim(ua,ub)= n l=1

(R(ua,il)−R(ua))(R(ub,il)−R(ub)) n

l=1

(R(ua,il)−R(ua))²ⁿ

l=1

(R(ub,il)−R(ub))²

. (2.4)

Another metric similarity uses thecosine-based approach[7], according to which the two usersuaandub, are considered as two vectors inn-dimensionalitem-space, wheren= |In|. The similarity between two vectors can be measured by computing the cosine angle between them:

sim(ua,ub)=cos(−→ua,−→ub)=

n l=1

R(ua,il)R(ub,il) n

l=1

R(ua,il)² n

l=1

R(ub,il)²

. (2.5)

In RS, the use of the Pearson correlation similarity metric to estimate the proximity among users performs better than the cosine similarity [7].

At this point in the recommendation process, a single user is selected who is called theactive user. The active user is the user for whom the RS will produce predictions and proceed with generating his/her neighborhood of users. Asimilarity matrix Sis generated, containing the similarity values between all users. For example, thei th row in the similarity matrix represents the similarity between useruiand all the other users. Therefore, from this similarity matrixSvarious schemes can be used in order to select the users that are most similar to the active user. One such scheme is the center-based scheme, in which from the row of the active useruaare selected those users who have the highest similarity value with the active user.

Another scheme for neighborhood formation is theaggregate neighborhood for- mation scheme. In this scheme, a neighborhood of users is created by finding users who are closest to thecentroidof the current neighborhood and not by finding the users who are closest to the active user himself/herself. This scheme allows all users to take part in the formation of the neighborhood, as they are gradually selected and added to it.

2.2.1.3 Generation of Recommendations

The generation of recommendations is represented by predicting a rating, i.e., by computing a numerical value which constitutes a predicted opinion of the active useruafor an itemij unseen by him/her. This predicted value should be within the same accepted numerical scale as the other ratings in the initial user-item matrix R. In the generation of predictions, only those users participate that lie within the neighborhood of the active user. In other words, only a subset ofkusers participate

Machine Learning Paradigms

Aristomenis S. Lampropoulos George A. Tsihrintzis

Machine Learning

Paradigms

Applications in Recommender Systems

Intelligent Systems Reference Library

Volume 92

Aristomenis S. Lampropoulos George A. Tsihrintzis

Machine Learning Paradigms

Applications in Recommender Systems

123

To my beloved family and friends

Aristomenis S. Lampropoulos

To my wife and colleague, Prof.-Dr. Maria Virvou, and our daughters, Evina,

Konstantina and Andreani

George A. Tsihrintzis

Contents

Introduction

1.1 Introduction to Recommender Systems

1.2 Formulation of the Recommendation Problem

1.2.1 The Input to a Recommender System

1.2.2 The Output of a Recommender System

1.3 Methods of Collecting Knowledge About User Preferences

1.3.1 The Implicit Approach

1.3.2 The Explicit Approach

1.3.3 The Mixing Approach

1.4 Motivation of the Book

1.5 Contribution of the Book

1.6 Outline of the Book

References

Chapter 2

Review of Previous Work Related to Recommender Systems

2.1 Content-Based Methods

2.2 Collaborative Methods

2.2.1 User-Based Collaborative Filtering Systems