HAL Id: hal-01956155
https://hal.archives-ouvertes.fr/hal-01956155
Submitted on 14 Dec 2018
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
Towards Scalable, Efficient and Privacy Preserving Machine Learning
Rania Talbi, Sara Bouchenak
To cite this version:
Preliminary results
Towards Scalable, Efficient and Privacy Preserving Machine Learning
Context and Motivation
Rania Talbi, Sara Bouchenak
INSA Lyon, France
{firstname.lastname}@insa-lyon.fr
Related work
Design principles
Objectives
References
⋮ ⋮ M(⋃ B%) 𝑩𝒊 : Local bank transactions of 𝐶+ 𝑪𝑭: Fraudulent company 𝑪𝒊 : Company i 𝐶. 𝐶/ 𝐶0 𝐶1 𝐴 𝑨: Central Supervision Authority 𝑀: Data Mining for fraud detection December 10th, Middleware 2018’ s doctoral symposium - Rennes, France.DynAmic Privacy Preserving machine Learning Framework (DAPPLE)
𝐷𝑂/ Privacy Preserving Classifier Learning Privacy Preserving Class Prediction 𝑸𝒋 [𝑋;]=>? [𝐶;]=>? [𝑤>] =>A 𝐶𝑆𝑃 𝐷𝑂.
.
.
.
𝐷𝑂D [𝑆>.]=>E [𝑆>/]=>F [𝑆>D]=>G Incremental update of the data model 𝑫𝑶𝒊: Data Owner i 𝑸𝒋: Classification Qeurier j [𝒘𝒌] 𝒑𝒌𝒘: Encrypted data model 𝐂𝐒𝐏: Classification Service Provider [𝑿𝒋] 𝒑𝒌𝒋: Encrypted classification query [𝑪𝒋] 𝒑𝒌𝒋: Encrypted classification response [𝑺𝒌𝒊] 𝒑𝒌𝒊: Encrypted local training data chunk from data owner 𝐷𝑂+§
Minimize the computational costs incurred by privacy preservation.§
Provide an end-to-end privacy preserving outsourced data classification service.§
Enable a set of mutually untrusted data owners to have a global vision on the union of their data without breaching the privacy of each one of them.§
Enable dynamic data model updates when new training data samples are available.§
We have used a synthetic dataset for fraud detection in a B2B network.§
This dataset contains 1000 bank transactions with 9 attributes each.§
We compare our work to the Ciphermed framework [8]. PPML Different ML algorithms Different Privacy-preservation objectives Different architectures - Clustering [1] - Classification [2] - Association Rule Mining [3] ML output protection Original data protection …. Distributed [4] Outsourced [5] Privacy Runtime Utility Privacy Runtime Utility Cryptographic techniques (SMC/HE, GC, OT) Non-cryptographic techniques (PP-Data Publishing techniques) Privacy Preservation techniques Privacy Runtime Utility§
Cryptographic based protection (data model, training data, classification queries and responses)§
Decent privacy and utility levels§
Partial homomorphic encryption (PHE ) based building blocks§
Efficient runtime§
Entirely outsourced ML computations over encrypted data§
Combine PHE with cryptographic blinding (DTPKC cryptosystem [6]) 𝑒𝑥 ∶ [𝑥]=>⨂ 𝑟 => = [𝑥⨁𝑟]=> 𝑼𝟏 𝑼𝟐 § (1) Blind inputs § (2) Partially decrypt blinded values § (3) Decrypt blinded values § (4) Run operation over blinded values § (4) remove blinding from the result (2) (4)§
We implemented the VFDT incremental decision tree learning algorithm [7] Naive approach: a combination of low level PP-building blocks 1st optimization : use inline building blocks 2nd optimization : Parallel computing B A A B§ [1] X. Hu, et. al: Privacy-Preserving K-Means Clustering Upon Negative Databases. ICONIP (4) 2018.
§ [2] S. Kim et al. Privacy-Preserving Naive Bayes Classification Using Fully Homomorphic Encryption. ICONIP (4)2018: 349-358
§ [3] L.Liu et al : Privacy-Preserving Mining of Association Rule on Outsourced Cloud Data from Multiple Parties. ACISP2018: 431-451
§ [4] H.Yu et al.: Privacy-Preserving SVM Classification on Vertically Partitioned Data. PAKDD 2006: 647-656
§ [5] T.Li et al. : Outsourced privacy-preserving classification service over encrypted data. J. Network and Computer Applications 106: 100-110 (2018)
§ [6] X.Liu et al. : An Efficient Privacy-Preserving Outsourced Calculation Toolkit With Multiple Keys. IEEE Trans Information Forensics and Security 11(11): 2401-2414 (2016)
§ [7] M. Domingos et al.: Mining high-speed data streams. KDD 2000: 71-80
§ [8] R.Bost et al. : Machine Learning Classification over Encrypted Data. NDSS 2015