• Aucun résultat trouvé

(Composition Theorem for the semi-honest model): Suppose that g is privately reducible to f and that there exists a protocol for privately

Dans le document PRIVACY PRESERVING DATA MINING (Page 29-34)

Cell Phone Data

Theorem 3.2. (Composition Theorem for the semi-honest model): Suppose that g is privately reducible to f and that there exists a protocol for privately

computing f Then there exists a protocol for privately computing g.

Proof Refer to [36].

The above definitions and theorems are relative to the r,cmi-honest model.

This model guarantees that parties who correctly follow the protocol do not have to fear seeing data they are not supposed to - this actually is suflacient for many practical applications of privacy-preserving data mining (e.g., where the concern is avoiding the cost of protecting private data.) The malicious model (guaranteeing that a malicious party cannot obtain private informa-tion from an honest one, among other things) adds considerable complexity.

While many of the SMC-style protocols presented in this book do provide guarantees beyond that of the semi-honest model (such as guaranteeing that individual data items are not disclosed to a malicious party), few meet all the requirements of the malicious model. The definition above is sufl[icient for understanding this book; readers who wish to perform research in secure multiparty computation based privacy-preserving data mining protocols are urged to study [36].

Apart from the prior formulation, Goldreich also discusses an alternative formulation for privacy using the real vs. ideal model philosophy. A scheme is considered to be secure if whatever a feasible adversary can obtain in the real model, is also feasibly attainable in an ideal model. In this frame work, one first considers an ideal model in which the (two) parties are joined by a (third) trusted party, and the computation is performed via this trusted party. Next, one considers the real model in which a real (two-party) protocol is executed without any trusted third parties. A protocol in the real model is said to be secure with respect to certain adversarial behavior if the possible real execu-tions with such an adversary can be "simulated" in the corresponding ideal model. The notion of simulation used here is diff'erent from the one used in Definition 3.1: Rather than simulating the view of a party via a traditional al-gorithm, the joint view of both parties needs to be simulated by the execution of an ideal-model protocol. Details can be found in [36].

3.3.1 Secure Circuit Evaluation

Perhaps the most important result to come out of the Secure Multiparty Com-putation community is a constructive proof that any polynomially computable

24 Solution Approaches / Problems

function can be computed securely. This was accomplished by demonstrating that given a (polynomial size) boolean circuit with inputs split between par-ties, the circuit could be evaluated so that neither side would learn anything but the result. The idea is based on share splitting: the value for each "wire"

in the circuit is split into two shares, such that the exclusive or of the two shares gives the true value. Say that the value on the wire should be 0 - this could be accomplished by both parties having 1, or both having 0. However, from one party's point of view, holding a 0 gives no information about the true value: we know that the other party's value is the true value, but we don't know what the other party's value is.

Andrew Yao showed that we could use cryptographic techniques to com-pute random shares of the output of a gate given random shares of the input, such that the exclusive or of the outputs gives the correct value, (This was formalized by Goldreich et al. in [37].) Two see this, let us view the case for a single gate, where each party holds one input. The two parties each choose a random bit, and provide the (randomly chosen) value r to the other party.

They then replace their own input i with i 0 r . Imagine the gate is an exclusive or: Party a then has {ia 0^^) and r^. Party a simply takes the exclusive or of these values to get {ia & Va) ® Vb as its share of the output. Party b likewise gets {ib 0 Vb) 0 Va as its share. Note that neither has seen anything but a randomly chosen bit from the other party - clearly no information has been passed. However, the exclusive or of the two results is:

O ^ OaQOb

= {{ia e Va) 0 n) 0 {{ib 0 n) 0 r^)

^ia^ib^raera^n^n

Since any value exclusive orred with itself is 0, the random values cancel out and we are left with the correct result.

Inverting a value is also easy; one party simply inverts its random share;

the other does nothing (both know the circuit, it is just the inputs that are private.) The and operation is more difficult, and involves a cryptographic protocol known as oblivious transfer. If we start with the random shares as described above. Party a randomly chooses its output Oa and constructs a table as follows:

Note that given party 6's shares of the input (first line), the exclusive or of Oa with Ob (the second hue) cancels out Oa, leaving the correct output for the gate. But the (randomly chosen) Oa hides this from Party b.

The cryptographic oblivious transfer protocol allows Party b to get the correct bit from the second row of this table, without being able to see any of the other bits or revealing to Party a which entry was chosen.

Repeating this process allows computing any arbitrarily large circuit (for details on the process, proof, and why it is limited to polynomial size see

Secure Multi-party Computation 25 [36].) The problem is that for data mining on large data sets, the number of inputs and size of the circuit become very large, and the computation cost becomes prohibitive. However, this method does enable efficient computation of functions of small inputs (such as comparing two numbers), and is used frequently as a subroutine in privacy-preserving data mining algorithms based on the secure multiparty computation model.

3.3.2 Secure Sum

We now go through a short example of secure computation to give a flavor of the overall idea - Secure sum. The secure sum problem is rather simple but extremely useful. Distributed data mining algorithms frequently calculate the sum of values from individual sites and thus use it as an underlying primitive.

The problem is defined as follows: Once again, we assume k parties.

P i , . . . , P/c. Party Pi has a private value Xi. Together they want to compute the sum S = Xli=.i ^* ^^ ^ secure fashion (i.e., without revealing anything except the final result). One other assumption is that the range of the sum is known (i.e., an upper bound on the sum). Thus, we assume that the sum S is a number in the field J^. Assuming at least 3 parties, the following protocol computes such a sum

-• Pi generates a random number r from a uniform random distribution over the field T.

• Pi computes Si = xi -\-r mod \F\ and sends it to P2

• For parties P 2 , . . . , P/c-i

- Pi receives Si-i = r + Yl]^i ^3 ^ ^ d | P | .

- Pi computes Si = Si-i -\-Xi mod | P | = T + X^j^i ^j ^ ^ d | P | and sends it to site Pi-fi.

• Pk receives Sk-\ = r + YljZi ^j ^ ^ ^

1^1-• Pk computes Sk = Sk-i + Xi mod | P | = r + J2j=i ^j ^^^ l-^l ^^^ sends it to site Pi.

• Pi computes S =^ Sk — r mod | P | = Ylj=i ^j ^^od \F\ and sends it to all other parties as well

Figure 3.3 depicts how this method operates on an example with 4 parties.

The above protocol is secure in the SMC sense. The proof of security consists of showing how to simulate the messages received. Once those can be simulated in polynomial time, the messages sent can be easily computed. The basic idea is that every party (except Pi) only sees messages masked by a random number unknown to it, while Pi only sees the final result. So, nothing new is learned by any party. Formally, P^ (2 = 2 , . . . , A:) gets the message Si-i = r + ^ J ~ ^ Xj.

i-l

Pr{Si-i =a)= Pr{r + ^ x,- = a) (3.1)

26 Solution Approaches / Problems I F U 5 0

R = 32

Slte1

32 + 5 mod 50 = 37

3 + 17 mod 50

= 20

2 0 - R =-12 mod 50 = 38

17

Site 2

12

37+ 12 mod 50

= 49

Site 3

49 + 4 mod 50 = 3

F i g . 3.3. Secure computation of a sum.

Pr{r = a-Y^xj) 1

W\

(3.2) (3.3) T h u s , Si-i can be simulated simply by randomly choosing a number from a uniform distribution over T. Since P i knows t h e final result and t h e number r it chose, it can simulate t h e message it gets as well. Note t h a t Pi can also determine Ylj=2^j ^ ^ subtracting xi. This is possible from t h e global result regardless of how it is computed^ so Pi has not learned anything from t h e computation.

In t h e protocol presented above. P i is designated as t h e initiator and t h e parties are ordered numerically (i.e., messages go from Pi t o P^+i. However, there is no special reason for either of these. Any p a r t y could be selected t o initiate t h e protocol and receive t h e sum at t h e end. T h e order of t h e parties can also be scrambled (as long as every p a r t y does have t h e chance t o add its private i n p u t ) .

This m e t h o d faces an obvious problem if sites collude. Sites P/_i and P/_l-i can compare t h e values t h e y send/receive to determine t h e exact value for xi. T h e m e t h o d can be extended t o work for an honest majority. Each site divides xi into shares. T h e sum for each share is computed individually.

However, t h e p a t h used is p e r m u t e d for each share, such t h a t no site has t h e same neighbor twice. To compute x/, t h e neighbors of P/ from each iteration would have t o collude. Varying t h e number of shares varies t h e number of dishonest (colluding) parties required to violate security.

Secure Multi-party Computation 27 One problem with both the randomization and cryptographic SMC ap-proach is that unique secure solutions are required for every single data min-ing problem. While many of the buildmin-ing blocks used in these solutions are the same, this still remains a tremendous task, especially when considering the sheer number of different approaches possible. One possible way around this problem is to somehow transform the domain of the problem in a way that would make different data mining possible without requiring too much customization.

Dans le document PRIVACY PRESERVING DATA MINING (Page 29-34)