Improving Social Recommendations by applying a Personalized Item Clustering policy
Georgios Alexandridis
School of Electrical and Computer Engineering National Technical University
of Athens
Zografou, 157 80, Greece
[email protected]
Georgios Siolas
School of Electrical and Computer Engineering National Technical University
of Athens
Zografou, 157 80, Greece
[email protected]
Andreas Stafylopatis
School of Electrical and Computer Engineering National Technical University
of Athens
Zografou, 157 80, Greece
[email protected]
ABSTRACT
In online Recommender Systems, people tend to consume and rate items that are not necessarily similar to one an- other. This phenomenon is a direct consequence of the fact that human taste is influenced by many factors that can- not be captured by pure Content-based or Collaborative Fil- tering approaches. For this reason, a desirable property of Recommender Systems would be to identify correlations be- tween seemingly different items that might be of interest to a particular user. This course of action is expected to im- prove the novelty and the diversity of the recommendations and therefore increase user satisfaction.
In this paper, we address this problem by proposing a socially-aware personalized item clustering recommendation algorithm. We are trying to locate patterns between the items that a user has evaluated by grouping them into dif- ferent clusters according to the rating behavior of the mem- bers of hisPersonal Network, which includes the individuals in his direct social network and those other persons that the user exhibits a similar item evaluation behavior. Once the clustering phase has been completed, we use each cluster’s members as seed items in order to construct an item con- sumption network. Then, by performing a random walk on the aforementioned network, we are able to produce recom- mendations that are accurate and at the same time novel and diverse. Preliminary results reveal the potential of this idea.
Categories and Subject Descriptors
H.3.3 [Information Search and Retrieval]: Clustering;
G.3 [Probability and Statistics]: Stochastic processes;
F.1.2 [Modes of Computation]: Probabilistic computa- tion
Keywords
social recommendation, personal network, spectral cluster- ing, item consumption network, random walk
1. INTRODUCTION
Research overRecommended Systems (RS) has grown al- most exponentially over the past years. Initial approaches tried to model the user to item interaction evident in every RS in a plethora of ways; collaborative filtering algorithms exploited the said relationship in order to infer similarities
and dissimilarities in taste between users that are not oth- erwise related to one another (implicit user to user interac- tion). On the other hand, content-based approaches tried to estimate item relevance by accumulating all available evalu- ations for each item and then compare them in some metric space. Lastly, hybrid systems combined both approaches, along with other possible sources of information (i.e. in the form of metadata) in an effort to produce better and more meaningful recommendations.
It is an indisputable fact that the recommendation pro- cess has an inherent social dimension as well. Apart from the fact that opinion and taste are formulated by a person’s interaction with his environment, it is also a common ev- eryday practice to turn to family, friends and acquaintances when we want to make a decision or a purchase. It could be further elaborated that we also tend to affiliate and establish bonds with people that we share the same interests with.
The emergence of Web 2.0 technologies and the advent of Online Social Networks (OSNs) introduced the social rela- tionships into the digital era and inevitably in the recom- mendation process, brining about the so-called Social Rec- ommender Systems (SRS). The user to item interaction of traditional RS has been extended in order to includeexplicit user to user connections. Those ties are extremely useful for the recommendation process as one of the most fundamen- tal characteristics of social networks isHomophily[14]. This term means that the members of a social network tend to be more similar to those individuals they are connected with than other persons with whom they don’t share direct re- lationships. Or, more strictly speaking, the acquaintances of any given member of an OSN do not constitute random samples drawn from the whole of the underlying population.
This observation has also been experimentally confirmed by the work of Singla et at [13].
Homophily is influenced by many factors such as age, gen- der, educational level, ethnicity etc. In our work, we plan to utilize this phenomenon in a novel approach by trying to examine the extend to which an individual’s interests are in- fluenced by the people he knows. We start by constructing a personal network for each user; one that includes the per- sons he has some interactions with, both direct and indirect.
This personal network is the basis for the item clustering algorithm that places the items that this user has already consumed in different clusters, according to the consumption behavior of her peers. Once the clustering phase has been completed, the items of each cluster are expected to share
some common characteristics. Therefore, using them as seed items and by performing a random walk on a specially con- structed item consumption network for this purpose, we will be able to get recommendations that will be accurate, novel and diverse.
2. RELATED WORK
The application of clustering methodologies into the tradi- tional RS domain is not new [1]. Researches have been using them in order to uncover the implicit relationships between users (or items) based on the available evaluations. In most cases, the purpose of clustering is to aide or to substitute user and item neighborhood formation.
The fusion of explicit user ties, in the form of social in- formation, into the clustering process is a relatively novel research field. Most approaches use them as an alternative information source, upon which they apply existing cluster- ing algorithms. For example, the authors of [4] propose a methodology to compute the inferred trust value between any pair of users and then perform a correlation cluster- ing of the user space based on those values. They introduce their method into traditional memory-based algorithms (CF and trust-based) and witness a relative improvement in the recommendation accuracy.
In [5], the authors apply theaffinity propagationclustering algorithm for the dynamic identification of the user clusters.
The distance measure they employ is the Jaccard coefficient;
the number of common neighbors in the trust network be- tween a pair of users. Their experiments have shown that their system outperforms other traditional clustering tech- niques (such as k-means) in terms of the accuracy of the recommendations as the number of user clusters grows.
In [12], the authors perform a hierarchical clustering of the user’s social relationships in order to form user neigh- borhoods. They usemodularity, a graph-based criterion to stop the cluster formation. In short, modularity counts the fraction of the in-cluster edges against the number of edges pointing to other clusters. In the next step, those neighbor- hoods are fed into user-based and item based CF algorithms that produce the recommendations. They have tested their approach into two different datasets and have witnessed im- provement both in the accuracy and the usefulness of the recommendations as measured by metrics such as precision and recall.
Our approach differs from those mentioned above in three key aspects; firstly, the clustering level is local (personal- ized) instead of system-wide, rendering our algorithm more scalable in terms of the dimensionality of the data involved in the computations. Secondly, while most approaches are geared towards grouping similar users, our approach is fo- cused on locating similar items through the identification of common access patterns in the personal network. Finally, the social network information is used in the filtering phase rather than the clustering algorithm itself.
3. THE PERSONAL NETWORK
So far, a person’s taste has been discussed in relation to other people. However, taste itself is also a multidimensional concept; after all most people are not confined to having a single interest only. For example, if someone has an interest in gardening and in playing music, she is expected to eval- uate gardening and music books. And since not all people
who are interested in gardening are also interested in music, his rating behavior might seem bizarre from the RS point of view; indeed, even in large scale RS, few people would be expected to rate items of both the aforementioned cate- gories.
We may further elaborate on this example by consider- ing that this user’s social network would consist of roughly two categories of people; those she’s met via her hobbies and other, more general acquaintances. It is therefore quite likely that the target user and some of her gardening peers would have evaluated some gardening items in common. Our intuition is to try to locate such relationships in the RS and it is for this reason that we are performing a personalized clustering of the consumed items. The personalized cluster- ing in this sense means that not every item evaluation in the RS is taken into account. Instead, they are filtered by those users that are explicitly (through the social network) or im- plicitly (through similar consumption behavior) related to the target user.
Our proposed algorithm works in two stages; initially, the ratings of a specific user above a certain relevance thresh- old are retrieved. Then, these items are put into one or more different clusters according to some well-defined crite- ria. Following, the items of each cluster are used as a seed in order to construct an item consumption network. This structure is traversed in a random walk fashion for the rec- ommendation of new items to the user.
3.1 Relationship Types
The explicit user to user ties that are evident in OSN in general and in SRS in particular lead to a variety of con- nections among them. It is therefore necessary to define the relationship types that would be considered valid for the creation of their personal network. The discipline of so- cial network analysis examines simple and advanced struc- tures present in social networks [15]. However, the nature of our research places the emphasis on Dyadic Relationships.
They constitute the most common relationship type in OSN and they directly involve two persons or actors. Those links may either be symmetric (or bidirectional) as in the case of Friendship or non-symmetric (directional). The most com- mon directional link in SRS isTrust, where a user expresses his opinion on another user’s behavior. These remarks are usually private to the user who has issued them and sig- nify how useful she has found the interaction with the other user. Trust statements are either binary (i.e. trust/distrust) or assume a broader set of values (usually in the [0,1] in- terval). The SRS that incorporate trust statements in the recommendation process are also know asTrust-aware Rec- ommender Systems.
3.2 Network Formation
As it has been stated before, we are interested in pro- viding a personalized filtering of the available items in the RS for each specific user. The basis of our approach is the Personal Network (PN) of each user which is constructed by two different but not necessarily distinct pools of users;
those that are part of the target user’s social network and those that are similar (defined by some notion of similarity) to him. Members of the personal network of a said user may be further categorized in the following groups
• Users in the direct social network of the target user that bear a similarity to her
u4
u5
u7
u6
u1
t1,4
u3
t3,5
t3,7
u2
t2,1
t2,4
ut
s1, t1
t2
s3, t3
s6
s7
Figure 1: Personal Network
• Other similar users
• Other users in the social network (Friend-of-a-Friend scheme) that might be similar to him
• Other users in the social network
Most widely-adopted indices of similarity in RS are the Pearson correlation coefficient, the cosine similarity and the Manhattan similarity. The first two measure similarity quite well when there is a large overlap between the items rated by different users, while the latter is more suitable in those cases where the overlap in ratings between different users is small.
Generally, the social and the similarity graphs could be accessed in a number of ways and indeed there is a wealth of methodologies in the SRS literature. As an initial approach, in order to estimate user proximity, we opted for weighted path counting criteria that are dependent on the structure of the PN. For example, in Figure 1,u3 is considered to be the most “close” user to the target userut since he belongs both to his direct social network (edge weightt3) , the direct social network ofut’s friend,u2(edge weightt2,1) and finally tout’s similarity network (edge weights3). Following, users u4 and u7 may be reached by two simple paths of lengths one and two that originate fromut. In order to estimate who is most proximate to him, we compute a total value of each path by accumulating the respective weights.
4. PERSONALIZED CLUSTERING 4.1 The Item-to-Item Adjacency Matrix
After determining the importance of each member in a user’s PN, we proceed into creating the item matrixA. Let It be the items that have already been evaluated by target userut above a given relevance threshold rrel (that might be dependent onut). ThenAis then×nadjacency matrix
i7
i5
i2 4
1 3
i3
i6
2
i4
i1 8 4
3
Figure 2: Item clusters of the adjacency matrix de- fined in Equation 1
(wheren=|It|) whoseai,jandaj,ielements denote the fre- quency items i, j ∈ It have been accessed together (above the relevance threshold) by members ofut’s PN. Equation 1 displays an example item matrix for a user who has eval- uated 7 items.
A=
i1 i2 i3 i4 i5 i6 i7
i1 0 0 0 3 0 4 0 i2 0 0 0 0 3 0 1 i3 0 0 0 0 0 2 0 i4 3 0 0 0 0 8 0 i5 0 0 3 0 0 0 4 i6 4 0 2 8 0 0 0 i7 0 1 0 0 4 0 0
(1)
It should be noted that since not all users in the PN are equally proximate to the target user, the elements of the matrix are not integer sums in the general case; indeed the contribution of each peer in the frequency of the access pat- tern between two items is not constant (i.e 1) but it is ac- tually scaled according to his proximity to the target user (subsection 3.2).
By definition, matrixAis symmetric and may be viewed as the adjacency matrix of an undirected graph whose nodes are the items already evaluated byut and whose edges rep- resent evaluation patterns on the same itemset by users in ut’s PN. Naturally, one would expect some items to be ac- cessed together more often than others and this phenomenon is reflected on the graph by the formation of item clusters (Figure 2). We wish to distinguish those clusters and for this reason we apply a spectral clustering algorithm [11] on the matrixAthat is outlined in the following subsection.
4.2 The Clustering Algorithm
For the given symmetric item adjacency matrixA:
1. Compute diagonal Degree matrix D whose elements aredi,i=P
j
ai,j
2. Compute the normalized symmetric Laplacian matrix L=I−D−1/2AD1/2
i1(50)
i2(30)
i3(40)
i4(80)
s1(10)
3
s2(20)
4
2 5
s3(15)
3
Figure 3: Item Consumption Network
3. Compute then eigenvalues ofL λ1 ≥λ2 ≥ · · · ≥λn and the corresponding eigenvectorsv1. . . vn. Since L is symmetric, all its eigenvalues are real, but not neces- sarily different; that is some eigenvalues might appear more than once, or more formally, have a multiplicity larger than 1.
4. According to the Perron-Frobenius theorem, the small- est eigenvalue of a non-negative symmetric matrix is always 0 and its multiplicitykspecifies the connected components of the graph induced by the similarity ma- trix A. We set k as the number of the desired item clusters.
5. Construct then×k matrix U whose column vectors are theklargest eigenvectors ofL,v1. . . vk.
6. Cluster thenrow vectors ofUintokclusters (C1. . . Ck) using an appropriate clustering methodology (e.g. k- means clustering). Then place each item ot its corre- sponding cluster by using a distance criterion.
The eigen decomposition of step 3 is not a time consuming task, since in the overwhelming majority of the cases, matrix A(and consequently matrixL) are of low dimensionality (i.e.
n <100).
5. THE ITEM CONSUMPTION NETWORK
For each cluster created in the previous step of our algo- rithm, we construct an Item Consumption Network (ICN) [7]. An example of such a network is illustrated in Figure 3. The black dotted nodess1 to s3 are all members of the same cluster (interconnected with continuous edges). The gray nodes i1 to i4 represent other items that have been evaluated by members of the target user’s PN but not by the target user herself. An edge from a node of the for- mer category to a node in the latter represents the fact that these two items have been accessed together by members ofut’s PN the number of times indicated by the weight of the respective edge. In the aforementioned figure, itemss2
andi1 have been evaluated together by 4 peers inut’s PN.
Finally the number in parentheses at each node shows the number of evaluations this particular item has received from the members of the PN.
In order to produce recommendations, the ICN is modeled as a graph with the purpose of performing a modified ran- dom walk on it. Indeed, the ICN graph has the properties of being connected and non-bipartite and for this reason it may be viewed as asymmetric time-reversible finite Markov chain [8]. The symmetric property is easily deduced from the fact that the graph is undirected (Figure 3). The term modified is a consequence of the fact thatpi,j is not set to be inversely proportional to the degree of the current node, as in uniform random walks. Instead, it is the result of the computation of the edge weight and the number of the eval- uations of both the current and the following node in the walk.
A fundamental property of the random walks on finite Markov chains is that they reach their steady state distri- butionπregardless of which is the starting node each time [8]. If M is them×m symmetric transition matrix of the chain (wheremis the cardinality of the itemset in the ICN) such thatM = (pi,j), i, j∈V,(i, j) ∈E then the following equation holds true
MTπ=π (2)
which may be interpreted as the fact that the matrixM has its largest eigenvalue equal to 1 and that the corresponding left eigenvector is the steady state distribution π. Conse- quently, if we perform an eigen decomposition on matrix M, we are able to retrieve the steady state vectorπ whose entries correspond to the probability of each item node being visited by the random walk.
In some cases, however, the dimensionality of M could be large and therefore eigen decomposition may become a time consuming task. For this reason, we follow the iterative approach outlined below in order to approximateπ[8]:
1. Create them×1 vectorrand set its entries to 0 apart from those which correspond to the seed items that assume the value |S|1 , (|S|is the cardinality of the seed itemset)
2. Computew=M×r.
3. While dist(w, r)> . Heredist(w, r) denotes a func- tion that computes vector distance in a metric space (e.g. euclidean distance) andis the vector similarity threshold (e.g. = 10−4)
(a) Setr=w
(b) Computew=M×r 4. End While
5. Return as recommendations theNnon-seed nodes with the largest probability value inw.
6. EXPERIMENTS
We performed a series of initial experiments of our al- gorithm and of other reference content-based, collaborative and trust based systems on the Epinions dataset crawled from the Epinions website by Massa and Bhattacharje [10].
Table 1 outlines the characteristics of this dataset. It should be noted that trust statements in this case are unweighted (binary). A first remark is that the dataset is extremely sparse both in terms of the ratings’ density and of the con- nectedness of the trust network as measured by the global
Table 1: The Epinions Dataset [10]
Epinions
Users 49290
Items 139738
Ratings 664824
Ratings’ Density 0,01%
Trust Statements 487182
Global Clustering Coefficient (Trust) 0,0002
clustering coefficient. Besides sparsity, the dataset contains a large proportion of users and items with too few ratings.
These characteristics greatly affect the quality of the pro- duced recommendations.
Another peculiarity of this specific dataset is that the rat- ings of the items are not uniformly distributed in the rating scale (1 to 5) but are skewed towards the upper scale (4 and 5) by a ratio of 1 to 3. This is attributed to the behavioral phenomenon of users not rating items they’ve consumed in general, but of predominantly rating items they have both consumed and liked. Therefore, any RS that would blindly recommend an item with a preference rating between 4 and 5 would exhibit satisfactory performance. This observation is both an indication that RS efficiency should not solely rely on optimizing the accuracy of the predicted preference value and a justification of examining other aspects of RS performance, such as the novelty and the diversity of the recommendations (that will be discussed below).
Since one of our stated goals has been the production of accurate recommendations, we have chosen to evaluate RS performance on theStatistical Accuracymetrics. Their pur- pose is to measure how close the recommendation rdu,i is to the actual rating ru,i. The most widely used metric in this sense is theRoot Mean Square Error (RMSE), which is defined over a Test SetT as:
RM SE= v u u t 1
|T|
|T|
X
n=1
(dru,i−ru,i)2 (3) where|T|is the cardinality of the test set.
Another important evaluation metric of RS is theRatings’
Coverage; that is the percentage of ratings for which the system manages to produce recommendations. It should be pointed out that a RS which exhibits satisfactory results in the statistical accuracy metrics is still considered to perform poorly if it manages to produce recommendations only for a handful of users or items. More formally, the Rating’s Coverage is defined as
Coverage= 100|TR|
|T| (4)
where|TR|is the cardinality of the set of the items for which the RS produced recommendations (generally,TR⊆T).
Our objective has also been to measure the quality of the recommendations is terms of how novel the new items are, compared to what the target user has already evaluated.
Clearly, a RS that makes palpable recommendations is con- sidered to be of low performance even if it is accurate. In our experiments, we used the definition ofDistance-based Item Novelty[3], which models the novelty of each item (in a list of recommended items) with respect to all the other already evaluated items on an euclidean space. More specifically, it
is defined as the average distance between the itemiat hand and the other already consumed items (that form the setS):
novelty(i|S) =X
j∈S
p(j|S)d(i, j) (5)
wherep(j|S) denotes the probability of itemjto have been already evaluated andd(i, j) is the distance measure. Most commonly, distance is related to similarity via the relation d(i, j) = 1−sim(i, j). The similarity measure can be any of the similarity measures mentioned before (Pearson, cosine, etc). In our experiments, we used the Manhattan distance measure which is defined as the absolute distance between itemsiandj(Equation 6)
d(i, j) =|ru,i−dru,i| (6) The last evaluation measure we have examined is diversity.
Diversity measures how different are the recommended items in a list from one another. Ideally, a RS that is able to capture the multitude of human taste should be equally able to propose items that would fulfill most of the interests of its users. We used the definition ofIntra-List Diversity [3]
diversity(R|u) = 2
|R|(|R|+ 1) X
k<n
d(ik, in) (7)
where Ris the list of nrecommended items (n=|R|) and ik∈R.
6.1 Other Systems
In order to thoroughly examine the performance of our proposed algorithm, we have implemented a number of RS (both traditional and social) and have evaluated them on the same Epinions dataset.
6.1.1 Baseline Systems
Those systems are given for reference purposes, in order estimate the relative performance improvement of the other approaches. ItemMean recommends each item according to the mean value of the evaluations received for that item so far, treating each rater equally (that is, without considering if there exist any relationships between the raters). On the other hand,UserMean recommends each item according to the mean value of the evaluations given by the target user to other items so far, treating each item equally (and not considering any correlations in-between the items).
6.1.2 Collaborative Filtering and Item-Based Rec- ommendation
At the heart of those traditional RS is the prediction for- mula proposed in [2]:
r[ut,it=rut+
|N|
P
i=1
wut,uc(ut−rut,it)
|N|
P
i=1
wut,uc
(8)
whereutis the target user anduc∈UNall of his neighbors with whose the similarity valuewut,uc is computable. Equa- tion 8 takes an equivalent form for the item based recom- mender, whererut is substituted byrit andwut,uc bywit,ic
and ic ∈ IN is now the set of items similar to it (whose similarity valuewit,ic is computable).
Table 2: Results on the Epinions Dataset (for a list of 5 recommended items)
Performance Metrics RMSE Coverage Novelty Diversity
A. Baseline
A.1 ItemMean 1.09 86.43% 11.89% 24.23%
A.2 UserMean 1.20 98.58% 9.70% 19.42%
B. Collaborative Filtering
B.1 Manhattan Similarity (All Neighbors) 1.07 79.57% 20.11% 56.23%
C. Item-Based Recommendation
C.1 Manhattan Similarity (All Similar Items) 1.20 39.29% 16.86% 45.26%
D. Trust-based Approaches
D.1 MoleTrust-1 1.23 25.58% 29.16% 43.62%
D.2 MoleTrust-2 1.16 56.52% 32.31% 54.02%
D.3 MoleTrust-3 1.12 70.89% 42.13% 56.65%
D.4 TidalTrust 1.08 74.67% 45.38% 59.17%
E. Our Recommender
E.1 Personalized Item Clustering 1.05 58.17% 53.11% 63.04%
6.1.3 Trust-based Systems
Trust-based systems are roughly put into two large cat- egories according to the way trust values are processed [6].
The first approach is to accumulate all available trust state- ments in the RS in order to estimate the system-wide influ- ence of each user. The RS that follow this principle are also known asreputation systemsand rely onglobal trust metrics in order to calculate the reputation of each and every user.
The second approach is to examine trust in a user-centric level; the emphasis is placed on each individual user and the RS departs from her in order to explore her trust network.
Since our algorithm processes the trust network on a lo- cal rather than a global level, we sought comparisons with two other SRS based onlocal trust metrics. The fist SRS is based on thegradual trust metric MoleTrust [9] proposed by Massa and Avesani. The trust graph is firstly transformed into an acyclic form (a tree) by removing all loops in it and then the trust statements are accumulated in a depth-first fashion, starting from each user, up to each and very other user (in the trust network). The propagation horizon de- termines the length of the exploration; the most common forms being MoleTrust-1, where only the users that target user trusts are considered, and MoleTrust-2, where the ex- ploration also includes those trusted by those the target user trusts. More formally, ifTut is the set that includes all users in ut’s trust network that have rated item it (which has not been evaluated by the target user yet), then the recom- mendation valuer[ut,it is approximated using the following formula (trust-based collaborative filtering):
r[ut,it =rut+ P
u∈Tut
tut,u(ru,it−ru) P
u∈Tut
tut,u
(9)
whererut is the mean of the ratingsut has provided so far.
The second RS is based on another popular gradual trust metric,TidalTrust, proposed by Golbeck [6]. TidalTrust is different from MoleTrust in the sense that no propagation horizon is required for the accumulation of trust; instead the shortest path from the target user to each other user in the trust network is computed. All trust paths above a predefined threshold form theWeb of Trust (WOT)for that particular user. If there exist more than one trust paths between two users, then the one with the biggest trust value
is chosen. If W OTut is the set that includes those users inut’s web of trust network that have rated item it, then the recommendation valuer[ut,it is approximated using the formula (trust-based weighted mean):
r[ut,it = P
u∈Tut
tut,uru,it
P
u∈Tut
tut,u
(10)
6.2 Results
Table 2 summarizes some first results on the set of the performance metrics presented earlier in this section. All results were obtained on the Epinions dataset by using the leave-one-out validation methodology, for a list of 5 recom- mended items. Naturally, a lower RMSE score means more accurate predictions while higher scores for coverage, novelty and diversity signify that the RS is able to produce recom- mendations for more of its users, that are more novel and more diverse respectively (the last three metrics are given in the percentage scale). A RS is then considered to outper- form another if it achieves a better outcome on all (or most of) the performance metrics.
A first observation is that our algorithm exhibits a steady performance lead in terms of the accuracy, the novelty and the diversity of the recommendations. We attribute these encouraging results, especially the accuracy of the recom- mendations, to the way the personal network of each user is processed and his influential neighbors are located (sub- section 3.2). Moreover, the increased novelty and diversity of the recommended items are due to the personalized item clustering strategy that we have followed. Indeed, in many cases, this technique manages to capture the diverse inter- ests of each individual user.
Some of the trust-based system display a very satisfactory behavior in terms of the ratings’ coverage. This stems from the aggressive manner in which they process the trust net- work, by accumulating all users up to a certain depth. It is for this reason, of not being selective, that they fail to achieve better results in terms of the accuracy of the recom- mendations, even though they are able to cover more users.
However, trust statements in the dataset used in the exper- iments are binary; had they assumed values in a broader range, then the Equations 9 and 10 would have revealed their full potential.
Lastly, the baseline systems may exhibit very satisfactory results in terms of the accuracy and coverage metrics (par- ticularlyItemMean) but those are rather the consequences of the peculiarities of the dataset at hand. As it has al- ready been discussed before, a RS that would recommend any item with a value in the range [4,5] would exhibit sat- isfactory results. It is obvious, however, that this approach is not plausible for any practical RS.
7. CONCLUSIONS AND FUTURE WORK
In this work, we proposed a novel social recommendation methodology based on personalized item clustering. Even though the initial results are satisfactory, we feel there is room for further enhancements on our algorithm. More specifically, it would be desirable to improve the procedure of the formation of the personal network. A line of research we are currently examining is to model trust and similarity as the marginal probabilities of an unknown joint probability distribution that we would like to approximate. Then user proximity would be estimated by directly sampling from the aforementioned distribution.
The spectral clustering algorithm could also be further developed by introducing some criteria that would examine the size and the quality of the produced clusters. This may be achieved by considering different clustering policies, such as the fuzzy k-means clustering, along with different distance functions (e.g. Chebysev of Mahalanobis distance) for the placement of the items within the clusters.
Finally, the properties of the random walk on the item consumption network (i.e. mixing rate) may be fine-tuned if we consider different transition probabilities, ones that would incorporate more information about the characteris- tics of each item node. A possible line of research in this area is to use more fine-grained approaches to model the relationship between the items.
8. REFERENCES
[1] J. Bobadilla, F. Ortega, A. Hernando, and A. Gutierrez. Recommender systems survey.
Knowledge-Based Systems, 46(0):109 – 132, 2013.
[2] J. S. Breese, D. Heckerman, and C. Kadie. Empirical analysis of predictive algorithm for collaborative filtering. InProceedings of the 14 th Conference on Uncertainty in Artificial Intelligence, pages 43–52, 1998.
[3] P. Castells, S. Vargas, and J. Wang. Novelty and diversity metrics for recommender systems: Choice, discovery and relevance. InInternational Workshop on Diversity in Document Retrieval (DDR 2011) at the 33rd European Conference on Information Retrieval (ECIR 2011), Apr. 2011.
[4] T. DuBois, J. Golbeck, J. Kleint, and A. Srinivasan.
Improving recommendation accuracy by clustering social networks with trust. InRecommender Systems
& the Social Web, volume 532, pages 1–8, 2009.
[5] X. Z. Georgios Pitsilis and W. Wang.5th IFIP WG 11.11 International Conference, IFIPTM 2011, Copenhagen, Denmark, June 29 - July 1, 2011.
Proceedings, chapter Clustering Recommenders in Collaborative Filtering Using Explicit Trust Information, pages 82–97. 358. Springer Berlin Heidelberg, 2011.
[6] J. A. Golbeck.Computing and applying trust in web-based social networks. PhD thesis, College Park, MD, USA, 2005. AAI3178583.
[7] Q. Liu, B. Xiang, E. Chen, Y. Ge, H. Xiong, T. Bao, and Y. Zheng. Influential seed items recommendation.
InProceedings of the sixth ACM conference on Recommender systems, RecSys ’12, pages 245–248, New York, NY, USA, 2012. ACM.
[8] L. Lovasz. Random walks on graphs: A survey, 1993.
[9] P. Massa and P. Avesani. Trust metrics on
controversial users: balancing between tyranny of the majority and echo chambers.International Journal on Semantic Web and Information Systems, 2007.
[10] P. Massa and B. Bhattacharjee. Using trust in recommender systems: An experimental analysis. In In Proceedings of iTrust2004 International Conference, pages 221–235, 2004.
[11] A. Y. Ng, M. I. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. InADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS, pages 849–856. MIT Press, 2001.
[12] M. C. Pham, Y. Cao, R. Klamma, and M. Jarke. A clustering approach for collaborative filtering recommendation using social network analysis.
Journal of Universal Computer Science, 17(4):583–604, feb 2011.
[13] P. Singla and M. Richardson. Yes, there is a correlation: - from social networks to personal behavior on the web. InProceedings of the 17th international conference on World Wide Web, WWW
’08, pages 655–664, New York, NY, USA, 2008. ACM.
[14] J. Sun and J. Tang. A survey of models and algorithms for social influence analysis. In C. C.
Aggarwal, editor,Social Network Data Analytics, pages 177–214. Springer US, 2011.
[15] S. Wasserman and K. Faust.Social Network Analysis:
Methods and Applications. Cambridge University Press, 1st edition, November 1994.