Computers and Software
(IRECOS)
Contents
Privacy-Preserving Distributed Collaborative Filtering Using Secure Set Operations
by Chongjing Sun, Yan Fu, Hui Gao, Junlin Zhou 2309
A Technique to Mine Clusters Using Privacy Preserving Data Mining
by J. Anitha, R. Rangarajan 2316
Secured and Encrypted Data Aggregation with Message Authentication Code in Wireless Sensor Networks
by A. Latha, S. Jayashri
2327
Hybrid Approach for Energy Optimization in Wireless Sensor Networks Using ABC and Firefly Algorithms
by T. Shankar, S. Shanmugavel
2335
Distributed Relay Node Selection and Assignment Technique for Cooperative Wireless Networks
by S. Sadasivam, G. Athisha 2342
Improving Network Life Time of Wireless Sensor Network Using LT Codes Under Erasure Environment
by V. Nithya, B. Ramachandran
2349
On the Performance of MANET Using QoS Protocol
by B. Nancharaiah, B. Chandra Mohan 2356
Performance Analysis of Modulation and Coding to Maximize the Lifetime of Wireless Sensor Network
by M. Sheik Dawood, G. Athisha
2363
Energy Aware Zone Routing Protocol Using Power Save Technique AFECA
by Ravi G., K. R. Kashwan 2373
Design of Vertical Handoff Initiation and Decision Algorithm in Heterogeneous Wireless Networks
by S. Aghalya, A. Sivasubramanian 2379
Analysis of Depth Based Routing Protocols in Under Water Sensor Networks
by J. V. Anand, S. Titus 2389
Study of Energy Efficient Protocols Using Data Aggregation in Wireless Sensor Network
by Nagendra Nath Giri, G. Mahadevan 2403
EECPS-WSN: Energy Efficient Cumulative Protocol Suite for Wireless Sensor Network
by Nagendra Nath Giri, G. Mahadevan 2414
A Case Study of Using RM-ODP in Mobile Cloud Computing Applications
by M. Jebbar, A. Sekkaki, O. Benammar 2428
Spam Detection and Elimination of Messages from Twitter
by Sajin S. Chandran, Murugappan S. 2438
(continued)
.
Improving Search Results Through Reducing Replica in User Profile
by P. Srinivasan, K. Batri 2444
An Evaluation of the Movie Song Browser System Among IT and Non-IT Users
by Munauwarah, Nazlena Mohamad Ali, Hyowon Lee 2453
An Access Control Model of Web Services Based on Multifactor Trust Management
by R. Joseph Manoj, A. Chandrasekhar 2460
Performance Evaluation of the Hearing Impaired Speech Recognition in Noisy Environment
by C. Jeyalakshmi, V. Krishnamurthi, A. Revathy 2467
SEVALERPS a New EX-ANTE Multi-Criteria Method for ERP Selection
by Abdelilah Khaled, Mohammed Abdou Janati-Idrissi 2477
A Novel Expert System in Hospital Location Analysis with the Aid of Adaptive Artificial Bee Colony (AABC)
by K. Janaki, N. Radhakrishnan
2486
Design of High Speed Serial-Serial Multiplier for OFDM Applications
by N. Saravanakumar, A. Nirmal Kumar, K. N. Vijeyakumar, M. K. Ananda Moorthy 2495 Feature Based Image Retrieval Using Fused Sift and Surf Features
by V. Vijayarajan, M. Dinakaran 2500
A New Multibiometric Identification Method Based on a Decision Tree and a Parallel Processing Strategy
by Kamel Aizi, Mohamed Ouslim
2507
Computed Tomography Images Restoration Using Anisotropic Diffusion Regularization
by Faouzi Benzarti, Hamid Amiri 2515
Secure Medical Image Retrieval Using Dynamic Binary Encoded Watermark
by A. Umaamaheshvari, K. Thanushkodi 2521
Microarray Gene Expression and Multiclass Cancer Classification Using
Improved PSO Based Evolutionary Fuzzy ELM Classifier with ICGA Gene Selection by T. Karthikeyan, R. Balakrishnan
2532
Comparative Analysis of Intrusion Detection System with Mining
by S. Vinila Jinny, J. Jayakumari 2540
Enhanced Distributed Text Document Clustering Based on Semantics
by J. E. Judith, J. Jayakumari 2545
Privacy-Preserving Distributed Collaborative Filtering Using Secure Set Operations
Chongjing Sun, Yan Fu, Hui Gao, Junlin Zhou
Abstract – At present, collaborative filtering has been wildly used in many fields such as e-commerce, search engineering, and etc. To produce a better recommendation, many data owners want to collaborative with each other to build a shared model. Considering the privacy problem, the data owner is reluctant to reveal its data to others. To solve this problem, we present a privacy-preserving approach using the secure set operations and encryption methods. In our method, firstly the private set intersection cardinality protocol is adopted to compute the user similarities. Then our method uses the homomorphic encryption to compute the predicted rating values for the unrated items. Finally, the model recommends the top-k unrated items to each user.
We show that the distributed collaborative filtering based on our approach can provide zero loss of accuracy in the recommendation while preserving the privacy of different data owners. Copyright © 2013 Praise Worthy Prize S.r.l. - All rights reserved.
Keywords: Privacy Preserving, Set Operations, Collaborative Filtering
Nomenclature
U The set of all users I The set of all items
Iu The set of items that user u rated rui Rating value of user u rated to item i suv Similarity of users u and v
t
O The tth participant in the distribute system
t
ui The ith user of the participant O t
t
Ii The item set rated byu it
q The integer from 0 to q1
Random permutation function
H Hash function
Epk Encrypt a plaintext with public key pk Dsk Decrypt a ciphertext with private key sk
I. Introduction
Nowadays, the explosive growth of the information on the Web leads to the information overload problem, which makes people get lost in all of the massive information. To provide better services to users and make more benefit from the product selling, many information filtering and recommendation techniques have been proposed [1], such as the classic collaborative filtering, the location-based recommendation service [2], and the context-dependent recommendation [3].
This can help people filtering out the redundant information, shortening the searching time, and finding the personalized items which they are most interested in.
Recommender System plays an important role in filtering the information, and many works have been proposed.
Among them, the Collaborative Filtering (CF) is a classic technique and widely be used in many e-commerce sites.
Usually, some small or start-up companies do not have enough data to provide satisfying recommendations to their customers. They want to collaborative with other companies to build a shard recommender system which can provide better recommendations. But the problem is that the other companies do not reluctant to share their data considering the privacy of their customers.
For example, some customers buy some private products and do not want others to know. The data sharing may violate the privacy of these users. Under this condition, we consider how to build a shared recommender system without disclosing the privacy.
In this paper, we focus on the binary user-item ratings, such as buy or not buy a product. Hence the privacy is defined as whether a user bought or rated an item. Based on the secure set operations and homomorphic encryption, we propose an algorithm which can build the shared recommender model without disclosing the privacy while having the zero accuracy losing. The user-based CF mainly has two steps, the similarities computation and rating scores prediction.
In the first step, we adopt the private set intersection cardinality protocol to compute the similarity between users without revealing the true ratings for each user. In the second step, we design an approach based on the homomorphic encryption to generate the predicted ratings for the unrated items.
C. J. Sun, Y. Fu, H. Gao, J. L. Zhou
Copyright © 2013 Praise Worthy Prize S.r.l. - All rights reserved International Review on Computers and Software, Vol. 8, N. 10 Finally, the model selects the top-k unrated items to
each user based on the predicted ratings.
To solve the privacy and secure problem, the correspo- nding techniques have been developed very fast, especi- ally in the mobile [4] and RFID [5] areas. In this paper, we put our focus on the privacy problem under the distributed CF, which conducts the algorithms on the rating data stored in the multiple responsories. Plot and Du [6]
proposed privacy-preserving algorithms on the Collabo- rative Filtering recommendation under the horizontally or vertically partitioned data. Yakut and Polat [7] presented privacy-preserving schemes to make the item-based predictions on the arbitrary distributed data.
Both works cause the accuracy loss when recommending items to users. Plot and Du [8] designed methods on the privacy- preserving CF under the vertically distributed data, which select all the users as the targeted user’s neighbours. In our work, we only select the top-k users as its neighbours. Considering the distributed CF techniques, Kaleli and Polat [9] achieved the privacy-preserving on a model- based CF (the naive Bayesian classifier-based CF) recommendation. Yakut and Polat [10] also gave a solution on the privacy-preserving model-based CF (the SVD- based CF) recommendation. In our work, we solve the privacy- preserving problem on the memory-based CF. In our work, the secure set operations and homomorphic encryption techniques are combined for designing a new privacy- preserving scheme, which can conduct the shared user- based CF recommendation without any accuracy losing.
The rest of this paper is organized as follows. In Section 2, we introduce the preliminaries and define the research problem in this paper. In Section 3, we devise the privacy-preserving distributed collaborative filtering approach, and evaluate it in Section 4. Finally, we conclude this paper in Section 5.
II. Preliminaries and Problem Definition
In this paper, we focus on the recommender systems based on the collaborative filtering technique, which is one of the most successful technologies. Specifically, this work solves the privacy-preserving problem concerning the user-based CF model, in which there is a list of users
1 2
= , ,..., n
U u u u and a list of itemsI=
i i1 2, ,...,im
. All the binary ratings can be summarized in a user-item table, which contains the rating scores rui provided by user u for item i. rui is set to 1 if u has rated item i; otherwise 0. Each user u has a list of rated items
0
u ui
I i | iI ,r .
Many metrics have been proposed to compute the similarity between two uses. Suppose the rating vector for user u and v are ru and rv respectively. Similarity measures for the binary ratings are listed in Table I.
For the binary ratings, Cosine measure is equivalent to Slaton’s measure. As our work adopts the secure set operations, we put emphasis on the measures based on set
operations. After the similarities between any two users are obtained, the predicted rating score for the item can be calculate by Formula (1):
u u
ui v N uv vi v N uv
r s r s
(1)
where Nudenotes the top-k most similar users of the target user u.
TABLEI SIMILARITY MEASURES [11]
Name of measure method Formula of similarity suv Cosine
2 2
u v
u v
r r
r r
Salton u v
u v
I I I I
Jaccard u v
u v
I I I I
Dice 2 u v
u v
I I I I
LHN-I u v
u v
I I I I
The secure computation technique [12] needs to be designed for the computation between partners without leaking the information. In our work, we adopt the secure set operations and homomorphic encryption to attain our task, the privacy-preserving collaborative filtering. The secure set operations allow one party to compute the result of a set operation with another party, such that they learn nothing about the inputs of each other beyond the result of the set operation.
From the previous analysis, we only need to get cardinalities of the set intersection and set union. Some works on the Private Set Intersection Cardinality (PSI-CA) and Private Set Union Cardinality (PSU-CA) have been learned to make sure that the parties are only allowed to learn the magnitude of set intersection or union.
Emiliano [13] proposed the solutions on this problem which can achieve the complexities linear in the size of input sets.
Homomorphic Encryption is a technique which allows certain operations on the ciphertext. Given two messages m1 and m2 , the additively homomorphic encryption schemes satisfy the following properties:
1 2
1 2sk pk pk
D E m E m m m (2)
1 m2
1 2sk pk
D E m m m (3)
We adopt the Paillier cryptosystem [14] in our method, a classic additively homomorphic encryption system.
Problem definition. Suppose that there are p parties and m items in the distributed system, and the data is horizontally partitioned. Each party has a number of users who rate the items, i.e. O t has users as follow:
1 2
= t , t ,...,
t t
U u u unt
User ui t has an item rating vector Ri t , and the item set rated by ui t is represented by Ii t .
The overall architecture of the privacy-preserving distributed CF system is depicted in Fig. 1.
Fig. 1. The distributed collaborative filtering infrastructure The p parties cooperate with each other to establish a better shared CF model while preserving the original privacy information about their preferences, i.e., the ratings for items. We design the protocols under the semi-honest model [15], which means that the participant follows the protocol strictly, but can keep the intermediate calculating data to analyze more information.
It is reasonable for small or medium companies to build a shared collaborative filtering model under the semi-honest model, in which they want to get more benefit from the data sharing without the invasion of other’s privacy.
III. New Privacy-Preserving Distributed Collaborative Filtering
In this section, we combine the secure set operation with the encryption technique to design the privacy- preserving CF schemes under the distributed system.
Private similarity computation
In order to recommend items to a target user, we need to compute the similarities between this user and all the others. As the other users are distributed in different parties, we need to design the Private Similarity Computation (PSC) which can compute the similarity between users without leaking the personal ratings. To explain our method clearly, we give the PSC on the data distributed on the two-parties, which can be easily extended to multi-parties.
Suppose that two parties Alice and Bob have na and nb users respectively. Then we simplify the representation of similarity matrix of these users as:
T
S A C C B
(4) Protocol 1. Private set intersection cardinality
1 2 1 2
1 2 1
Alice: Input Bob: Input
a b
a a a b b b
n n
a a b
q q q
I ,I ,...,I I ,I ,...,I
r ,r r ,
1 1
2
1 1
a b
b q
r r
a
r
x g y g
i i n j j
2 a 1 a na ab b
b j j
a a b b
j
i i j
r x, RH ,...,RH
a a
i i
n I I
HI H I HI H I
RH HI
1 2
rb
a a
a i i
a a
i i
i i n DR RH
DR DR
2 1
1
1
1 1
b b
a a
na
b b
nb
b r b r
j j
b y , DR ,...,DR
a DH ,...,DH
j j n RH x HI
i i n
2 1
1
=
1 and 1
a a
b b
j j
/ r mod q
a r a
i i
a a
i i
a b
i j a b
DH H' RH
TR y DR
DH H' TR
Output : | DH DH | for i i n j j n
C. J. Sun, Y. Fu, H. Gao, J. L. Zhou
Copyright © 2013 Praise Worthy Prize S.r.l. - All rights reserved International Review on Computers and Software, Vol. 8, N. 10 Here A and B are nana and nbnb matrices
respectively, which represent the similarities of users belong to Alice and Bob respectively. C is annanb matrix representing the similarity between two users, in which one is from Alice and the other is from Bob. To select the top-k most similar users to a targeted user, Alice only needs to know the matrix A and C, while Bob only needs to know B and C . Therefore, we advance the problem on how to compute the matrix C in a secure manner without revealing the detailed rating score. Based on the PSI-CA [13], we propose the private similarity computation as shown in Protocol 1.
Alice hasna users, and each user has a rated item set
a
Ii . Alice and Bob share the common primes p and q with q | p1, of which p can be set as 1024 or 2048 and
q be 160 or 224. The protocol is conducted on a generator g of subgroup of size q , and two hash functions,
0 1* *pH : , andH' :
0 1, *
0 1, k, where k is the security parameter.From Protocol 1, we can see that Alice learns nothing about which items are the intersection as Bob shuffled the set of Alice. The similar privacy proof can be found in [13].
Correctness. For the i-th user of Alice and j-th user of Bob, the rated item sets Ii a and I jb are processed by Protocol 1 as follows:
2 1
2
1 1
1 2
2
1 1
1 a
a
b
a b
b b
b
a b
/ r mod q
a r a
i i
a r r r
i
b r b r
i j
b r r r
i
DH H ' y DR
H ' g H I
DH H ' x ( HI )
H ' g H I
Protocol 2. Secure top-n recommendation
recommends top- items to t
t
O n ui
Party ll other parties
Paillier_crypt aggregate
t i
P U t
i
t U t
t pk
O O
S ,Id ,Id top_neigs u ,k pk ,sk
a Id ,O
p E
a t pk , p t,Id P,Id U
C, f
reorder
Id P
0
For 1
s pt
j :| C |
1 x
x f j
O does
a x aggregate
Id U ,O x
1
1
sends to 1
j x j
pk
f j
j
s E a s
s O
f 1|C|
End
O does
1 1
aggregate
1 |C|
t i
f |C| U f |C|
|C| s
u sk
a Id ,O
r / sum S D s
1 1
top- unrated items
f |C|
|C| |C|
s Epk a s
Return n
Therefore, if two values i a i b and i a Ii a ,
b b
i Ii , then there must exist two values d a DH i a and b b
d DHj , withd a d b . Then Alice learns the set intersection cardinality by counting the number of matching pairs.
Suppose that the number of items rated by each user can be shared with all parties. Then the similarity shown in Table I can be directly computed after the set intersection cardinality is securely learned.
Matrix C in Formula (4) can be computed as Formula (5):
a b
i j
ij a b
i j
a b
i j
a b
a b
i j
i j
| I I |
C
| I I |
| DH DH |
| I | | I | | DH DH |
(5)
Other similarity measure can be computed similarly.
Finally Alice shares the matrix C with Bob. If there are p parties in the distributed system, then each pair of them need to cooperate with each other to securely compute their matrix C.
III.1. Secure Top-N Item Recommendation For a targeted user, the top-k neighbors are selected according to their similarities, and the rating predictions are produced by aggregating the ratings of its neighbors.
Suppose that the party O t wants to recommend items to its customer ui t . Protocol 2 shows the secure top-n item recommendation. The aggregate function in the protocol is defined in Formula (6):
|
aggregate = x U t
j i
U x
v u j Id u v v
Id ,O s r
(6)The party O t first selects the top-k most similar users which usually distributed in different parties. The protocol gives the party indices Id P and the selected user indices
U
Id in each party. Then O t generates a pair of public and secret keys using the Paillier cryptosystem. According to the Formula (2), the multiplication of the encrypted ratings can be decrypted as the summation of the ratings.
Hence, the summation can be calculated without disclosing the rating value of each user. Finally, O t decrypts the ciphertext and gets the predicted ratings to each item, and selects the top-n items to the targeted user.
IV. Experimental Evaluation
By analyzing our privacy-preserving protocols, we can
conclude that the protocols have zero losing on the accuracy.
Therefore, in this section we show the improvement on the recommendation when the parities cooperated with others.
The datasets include Epinions [16] and Friendfeed[17].
We sample the original data, and finally Epinions contains 4726 users, 3907 items and 164221 ratings in total. As compared, Friendfeed contains 3133 users who collected 4956 items and 92351 ratings. The metrics evaluated the recommendation are precision, recall, F1 and HD. The definitions of them can found in [18].
The first experiment illustrates the improvement when each party cooperated with all the others. We divide each dataset into 5 parties. For the users in each party, we recommend the top-n items by analyzing the k most similar users of this party (isolated) compared with the users of all parties (cooperated). Figs. 2 and 3 show the results of experiments conducted on two datasets respectively. The x-axis represents the i-th party. Clearly on both dataset, the recommendation results on the cooperated data are better than the results on the isolated data.
Next, we measure how much the improvement can be obtained when a party cooperates with different number of parties.
We recommend the top-n items to the user of first party by analyzing the k most similar users of this party compared with the users from p cooperated parties, where p from 2 to 5. Figs. 4 and 5 show the results on datasets Epinions and Friendfeed. The trend is that the values of measures increase when the number of cooperated parties increases.
But when the number is 4 on Epinions, the values decreased a little which means that some noise users exist in this party to the users of first party. We will focus on this problem in the future work to avoid the decreasing.
V. Conclusion
In this paper, we focus on the privacy problem concerning how to build a shared collaborative filtering model without disclosing any user’s privacy. For this problem, we designed a solution under the semi-honest model.
Theory analysis supported that our scheme combining secure set operations with the encryption technique can preserve the privacy while maintain the accuracy of the rating prediction. The experimental results show that the recommendation accuracy can be improved by cooperating with others.
But some noise party may results in the accuracy decreasing. Next, we will put our emphasis on this problem.
Acknowledgements
This research work was supported by National Natural Science Foundation of China under Grant No.61003231.
C. J. Sun, Y. Fu, H. Gao, J. L. Zhou
Copyright © 2013 Praise Worthy Prize S.r.l. - All rights reserved International Review on Computers and Software, Vol. 8, N. 10
1 2 3 4 5
0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08
(a) Different parties
Precision
Isolated Cooperated
1 2 3 4 5
0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08
(b) Different parties
Recall
Isolated Cooperated
1 2 3 4 5
0 0.01 0.02 0.03 0.04 0.05 0.06 0.07
(c) Different parties
F1
Isolated Cooperated
1 2 3 4 5
0 0.2 0.4 0.6 0.8 1
(d) Different parties
HD
Isolated Cooperated
Figs. 2. Recommendation results on Epinions
1 2 3 4 5
0 0.01 0.02 0.03 0.04 0.05 0.06
(a) Different parties
Precision
Isolated Cooperated
1 2 3 4 5
0 0.01 0.02 0.03 0.04 0.05 0.06 0.07
(b) Different parties
Recall
Isolated Cooperated
1 2 3 4 5
0 0.01 0.02 0.03 0.04 0.05
(c) Different parties
F1
Isolated Cooperated
1 2 3 4 5
0 0.2 0.4 0.6 0.8 1
(d) Different parties
HD
Isolated Cooperated
Figs. 3. Recommendation results on Friendfeed
1 2 3 4 5
0 0.01 0.02 0.03 0.04 0.05
(a) Number of cooperated parties
Precision
1 2 3 4 5
0 0.01 0.02 0.03 0.04 0.05 0.06 0.07
(b) Number of cooperated parties
Recall
1 2 3 4 5
0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04
(c) Number of cooperated parties
F1
1 2 3 4 5
0 0.2 0.4 0.6 0.8 1
(d) Number of cooperated parties
HD
Figs. 4. Cooperation with different number of parties on Friendfeed
1 2 3 4 5
0 0.01 0.02 0.03 0.04 0.05 0.06 0.07
(a) Number of cooperated parties
Precision
1 2 3 4 5
0 0.01 0.02 0.03 0.04 0.05 0.06 0.07
(b) Number of cooperated parties
Recall
1 2 3 4 5
0 0.01 0.02 0.03 0.04 0.05 0.06 0.07
(c) Number of cooperated parties
F1
1 2 3 4 5
0 0.2 0.4 0.6 0.8 1
(d) Number of cooperated parties
HD
Figs. 5. Cooperation with different number of parties on Epinions
References
[1] Sneha, Y.S., Mahadevan, G., Parvathi, R.M.S., Recommender system based on user ratings: A comprehensive study and future challenges, (2013) International Review on Computers and Software (IRECOS), 8 (7), pp. 1624-1635.
[2] Wu, J., Wu, Z., Mobile location-aware personalized recommendation with clustering-based collaborative filtering, (2012) International Review on Computers and Software (IRECOS), 7 (5), pp. 2231-2238.
[3] Yao, L., Yang, W., A context-aware recommender for trustworthy
service, (2012) International Review on Computers and Software (IRECOS), 7 (6), pp. 3354-3359.
[4] Tripathy, P.K., Biswal, D., Multiple server indirect security authentication protocol for mobile networks using elliptic curve cryptography (ECC), (2013) International Review on Computers and Software (IRECOS), 8 (7), pp. 1571-1577.
[5] M. Eslamnezhad Namin, F. Badihiyeh Aghdam, M. Hosseinzadeh, A Secure and Efficient RFID Mutual Authentication Protocol, (2011) International Journal on Communications Antenna and Propagation (IRECAP), 1 (5), pp. 429-433.