Towards Scalable, Efficient and Privacy Preserving Machine Learning

(1)

HAL Id: hal-01956155

https://hal.archives-ouvertes.fr/hal-01956155

Submitted on 14 Dec 2018

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Towards Scalable, Eﬀicient and Privacy Preserving Machine Learning

Rania Talbi, Sara Bouchenak

To cite this version:

(2)

Preliminary results

Towards Scalable, Efficient and Privacy Preserving Machine Learning

Context and Motivation

Rania Talbi, Sara Bouchenak

INSA Lyon, France

{firstname.lastname}@insa-lyon.fr

Related work

Design principles

Objectives

References

⋮ ⋮ M(⋃ B_%) 𝑩_𝒊 : Local bank transactions of 𝐶₊ 𝑪_𝑭: Fraudulent company 𝑪_𝒊 : Company i 𝐶_. 𝐶_/ 𝐶0 𝐶₁ 𝐴 𝑨: Central Supervision Authority 𝑀: Data Mining for fraud detection December 10th, Middleware 2018’ s doctoral symposium - Rennes, France.

DynAmic Privacy Preserving machine Learning Framework (DAPPLE)

𝐷𝑂_/ Privacy Preserving Classifier Learning Privacy Preserving Class Prediction 𝑸_𝒋 [𝑋_;]_=>_? [𝐶_;]_=>_? [𝑤_>] _=>_A 𝐶𝑆𝑃 𝐷𝑂.

.

𝐷𝑂_D [𝑆_>.]_=>_E [𝑆_>/]_=>_F [𝑆_>D]_=>_G Incremental update of the data model 𝑫𝑶_𝒊: Data Owner i 𝑸_𝒋: Classification Qeurier j [𝒘_𝒌] _𝒑𝒌_𝒘: Encrypted data model 𝐂𝐒𝐏: Classification Service Provider [𝑿_𝒋] _𝒑𝒌_𝒋: Encrypted classification query [𝑪_𝒋] 𝒑𝒌𝒋: Encrypted classification response [𝑺_𝒌𝒊] _𝒑𝒌_𝒊: Encrypted local training data chunk from data owner 𝐷𝑂+

§

Minimize the computational costs incurred by privacy preservation.

§

Provide an end-to-end privacy preserving outsourced data classification service.

§

Enable a set of mutually untrusted data owners to have a global vision on the union of their data without breaching the privacy of each one of them.

§

Enable dynamic data model updates when new training data samples are available.

§

We have used a synthetic dataset for fraud detection in a B2B network.

§

This dataset contains 1000 bank transactions with 9 attributes each.

§

We compare our work to the Ciphermed framework [8]. PPML Different ML algorithms Different Privacy-preservation objectives Different architectures - Clustering [1] - Classification [2] - Association Rule Mining [3] _ML output protection Original data protection …. Distributed [4] Outsourced [5] Privacy Runtime Utility Privacy Runtime Utility Cryptographic techniques (SMC/HE, GC, OT) Non-cryptographic techniques (PP-Data Publishing techniques) Privacy Preservation techniques Privacy Runtime Utility

§

Cryptographic based protection (data model, training data, classification queries and responses)

§

Decent privacy and utility levels

§

Partial homomorphic encryption (PHE ) based building blocks

§

Efficient runtime

§

Entirely outsourced ML computations over encrypted data

§

Combine PHE with cryptographic blinding (DTPKC cryptosystem [6]) 𝑒𝑥 ∶ [𝑥]_=>⨂ 𝑟 _=> = [𝑥⨁𝑟]_=> 𝑼_𝟏 𝑼_𝟐 § (1) Blind inputs § (2) Partially decrypt blinded values § (3) Decrypt blinded values § (4) Run operation over blinded values § (4) remove blinding from the result (2) (4)

§

We implemented the VFDT incremental decision tree learning algorithm [7] Naive approach: a combination of low level PP-building blocks 1st _{optimization : use} inline building blocks 2nd _{optimization :} Parallel computing B A A B

§ [1] X. Hu, et. al: Privacy-Preserving K-Means Clustering Upon Negative Databases. ICONIP (4) 2018.

§ [2] S. Kim et al. Privacy-Preserving Naive Bayes Classification Using Fully Homomorphic Encryption. ICONIP (4)2018: 349-358

§ [3] L.Liu et al : Privacy-Preserving Mining of Association Rule on Outsourced Cloud Data from Multiple Parties. ACISP2018: 431-451

§ [4] H.Yu et al.: Privacy-Preserving SVM Classification on Vertically Partitioned Data. PAKDD 2006: 647-656

§ [5] T.Li et al. : Outsourced privacy-preserving classification service over encrypted data. J. Network and Computer Applications 106: 100-110 (2018)

§ [6] X.Liu et al. : An Efficient Privacy-Preserving Outsourced Calculation Toolkit With Multiple Keys. IEEE Trans Information Forensics and Security 11(11): 2401-2414 (2016)

§ [7] M. Domingos et al.: Mining high-speed data streams. KDD 2000: 71-80

§ [8] R.Bost et al. : Machine Learning Classification over Encrypted Data. NDSS 2015

2018 ACM/IFIP International Middleware Conference, Doctoral Symposium,