Distributed correlation generators

(1)

Distributed Correlation Generators

by

Joseph Hui

Submitted to the Department of Electrical Engineering and Computer

Science

in partial fulfillment of the requirements for the degree of

Master of Science in Electrical Engineering and Computer Science

at the

MASSACHUSETTS INSTITUTE OF TECHNOLOGY

June 2018

@

Massachusetts Institute of Technology 2018. All rights reserved.

Signature redacted

A uthor ...

Department of Electrical Engineering and Computer Science

May 23, 2018

Signature redacted

Certified by.

...

Vinod Vaikuntanathan

Associate Professor of Electrical Engineering and Computer Science

Thesis Supervisor

Accepted by.,

Signature redacted

...

/

)

0 slie A. Kolodziejski

Professor of Electrical Engineering and Computer Science

Chair, Department Committee on Graduate Students

JUN 18 2018

LIBRARIES

(2)

MITLibraries

77 Massachusetts Avenue

Cambridge, MA 02139

http://Iibraries.mit.edu/ask

DISCLAIMER NOTICE

The pagination in this thesis reflects how it was delivered to the

Institute Archives and Special Collections.

The Table of Contents does not accurately represent the

page numbering.

(3)

(4)

Distributed Correlation Generators

by

Joseph Hui

Submitted to the Department of Electrical Engineering and Computer Science on May 21, 2018, in partial fulfillment of the

requirements for the degree of

Master of Science in Electrical Engineering and Computer Science

Abstract

We study the problem of distributed correlation generators wherein n parties wish to simulate unbounded samples from a joint distribution D = Di x D2 X ... x D., once

they are initialized using randomness sampled from a (possibly different) correlated distribution. We wish to ensure that these samples are computationally indistin-guishable from i.i.d. samples from D. Furthermore, we wish to ensure security even against an adversary who corrupts a subset of the parties and obtains their internal

(initialization) state.

Our contributions are three-fold. First, we define the notion of distributed (non-interactive) correlation generators and show its connection to other cryptographic primitives. Secondly, assuming the existence of indistinguishability obfuscators, we show a construction of distributed correlation generators for a large and natural class of joint distributions that we call conditionally sampleable distributions. Finally, we show a construction for the subclass of additive-spooky distributions assuming private constrained pseudorandom functions (private CPRFs).

Thesis Supervisor: Vinod Vaikuntanathan Title: Associate Professor

(5)

(6)

List of Figures

2-1 The sampling procedure JointSamplen(1A) defined by the permutation

I I. ... ... 16

4-1 The constraint for the additive-spooky scheme . . . . 26

4-2 The seed generation for the additive-spooky scheme . . . . 27

5-1 The program to be obfuscated . . . . 32

A-i The program with reordered sampling . . . . 36

A-2 The program with only necessary keys . . . . 36

A-3 Switching from one ordering to another . . . . 38

(9)

(10)

Chapter 1 Introduction

(11)

We are interested in understanding the class of joint distributions D, x ... x D" that n parties can simulate, given randomness sampled from a correlated distribution. Several works have explored this problem in various settings:

" Secure sampling for two parties [16, 15, 171: two parties output samples of a pair of jointly distributed random variables such that neither party learns more about the other party's output than what its own output reveals.

" Non-interactive simulation for two parties [9, 13, 11]: the parties need to simu-late D x 2 without communicating with each other.

" Complexity of correlated randomness in secure multiparty computation 1121: given small amount of correlated randomness, n parties need to securely com-pute a function

f

on their private inputs.

We continue in this line of research. Specifically, we study the following question: for what class of joint distributions Di x - - - x D, can n parties non-interactively and securely simulate using small amount of correlated randomness? We consider a relaxed requirement where we don't require perfect simulation of D x ... x D, and in fact, we only require the output distribution of the parties to be computationally close to D x ... x D,. This means that any probabilistic polynomial time distinguisher should be unable to tell apart a sample from D1 x ... x D, versus the joint outputs of the parties. We note that all the above works only offer a partial answer to this question. For instance, the works on secure sampling [16, 15, 171 offer interactive solutions. The works on non-interactive simulation [9, 13, 111, however, do not have any security guarantees.

We address the aforementioned question by introducing the notion of distributed correlation generators (DCG). This notion corresponds to a joint distribution Di x

... x _{D, which is efficiently sampleable. It consists of the following two probabilistic} polynomial time algorithms:

" Seed Generator, SeedGen(1P l'): On input A, the security parameter and n, the number of parties, it outputs the initial seeds si, .. ,,s. The ith party gets

the seed si.

* Instance Generator, InstGen(si, i, id): On input si, the seed of the ith party and a session identifier id, it outputs a sample Xid,. The ith party in a session id outputs the sample Xid,i.

Ideally, we require that the above scheme satisfies the following correctness guar-antee: the joint output distribution of the parties in every session should effectively simulate sampling from D, x ... x D, and in particular, independent of all the sessions so far. This is formalized by requiring that the output distribution of the parties in every session should be computationally indistinguishable from a fresh sample drawn from D1 x ... x D.

Satisfying correctness alone is insufficient for many of our applications. In par-ticular, we need to handle the case when the distinguisher is given a subset of seeds,

(12)

{si}is

for some subset S c [n]. Indeed, achieving correctness alone is easy: all the

parties get a PRF key K and in session id, the 'th party samples from D, x ... x D,

using randomness PRF(K, id) and finally, outputs the ith coordinate in this sample. Defining security involves some challenges and in particular, it restricts the type of joint distributions Di x ... x D, the distributed correlation generators can simulate.

We discuss this next.

Defining Security. For simplicity, we illustrate our definition in the case of two parties. Intuitively, our definition gives us the following security guarantee.

Sup-pose an adversary corrupts party 1 so that she can see the sequence of samples a1

,

a2,... that party 1 outputs. We require that from her view, the sequence of

samples bi, b2, ... that party 2 outputs are computationally indistinguishable from

fresh samples b', b', . . . where each b' is sampled independently from the distribution D2 conditioned on Di = ai. Although this guarantee can in principle be defined for arbitrary distributions, we consider conditionally sampleable distributions, where the conditional distribution of D2 conditioned on E) = a (resp. E) conditioned on D2 = b) is efficiently sampleable.

Connections to FSS and Additive-Spooky Encryption. We start by

observ-ing connections between distributed correlation generators for a class of distributions called additive-spooky distributions [81, defined as follows. Here, D, X D2 associated

to a function

f

is defined as follows: sample x <- X and y <- Y uniformly at random, and z, w uniformly at random such that z

e

w = f(x, y). The output of D is (x, z)

and that of D2 is (y, w). We observe that DCGs can be constructed from function secret-sharing schemes, following [3, 4]. The application of FSS towards generating certain correlated distributions is similar to a portion of [21, which uses homomor-phic secret sharing (HSS) to generate certain correlated distributions, such as bilinear correlations. These connections are summarized in Section 3.1.

Constructions for Additive-Spooky Distributions. Our main contribution is a

construction of DCGs for 2-party additive-spooky distributions (corresponding to a Boolean function f) starting from any (single key) private constrained PRF. Roughly speaking, our construction proceeds as follows. Alice and Bob each generate their corresponding inputs to f using independent PRF keys. Alice also receives a private CPRF key, and Bob receives a derived key with a particular constraint (which depends on f(x, y) and contains Alice and BobAA2s hardcoded PRF keys). This allows Bob to learn something that depends on f(x, y) -in particular, he either learns the same bit as Alice, or a random bit. When we XOR these two bits together, we get either

0 (if Bob learns the same bit as Alice), or a random bit (otherwise). By adding

more CPRF keys, we can amplify this so that the XOR is either 0, or 1 with high probability, depending on whether f(x., y) is 0 or 1. However, Alice still learns nothing (because she does not even see the derived key), and Bob also learns nothing (because the key hides the constraint). For the detailed construction and proof, the reader is referred to Section 4.1.

(13)

num-ber of random oblivious transfer (OT) correlations using a fixed amount of correlated

randomness. Indeed, in the special case of random OT, we are able to simplify the

construction somewhat, and relax the assumption to just constrained PRFs (without

privacy). Furthermore, building on the standard transformation from random OT to

OT, we show how to achieve n instances of OT with communication 3n + o(n) which

improves on the work of Boyle, Gilboa and Ishai

151 who achieve 4n + o(n), albeit at

the price of achieving the weaker indistinguishability security.

Constructions for General Conditionally Sampleable Distributions. Finally,

we show how to construct DCGs for arbitrary conditionally sampleable distributions

assuming the existence of secure indistinguishability obfuscation schemes. For more

details, we refer the reader to Section 5.1.

(14)

Chapter

Distributed Correlation Generators

3l

(15)

2.1 Distributed Correlation Generators

We define the notion of distributed correlation generators DCG for a distribution D. It

consists of two probabilistic polynomial time algorithms SeedGen and InstGen defined

below.

* Seed Generator, SeedGen(1A, 1'): On input A, the security parameter and n,

the number of parties, it outputs the initial seeds si,..., Sn.

* Instance Generator, InstGen(si.i, id): On input

si,

the seed of the Zth party

and a session identifier id, it outputs a sample Xid,i.

We present our security definitions for a class of distributions that we call

condi-tionally samplable distributions. A condicondi-tionally samplable distribution is a joint

dis-tribution on variables (X

1

.... Xn) which admits a polynomial-time sampler Sample.

This sampler allows efficiently sampling from the distribution one variable at a time,

in any order, to build up a sample from the joint distribution.

We present the formal definition below. We think of Sample in the definition

as taking an input index

j

E [n] to sample next, indices si,

...,

i E

[n] that have

already been sampled, and the values (xi,..., xi .. ) that were already sampled, and

outputting the next value xj sampled according to Xj conditioned on the previously

sampled values.

Definition 2.1.1 (Conditionally Sampleable Distributions). A distribution D defined

on random variables X1.... , Xn is a conditionally samplable distribution if there exists

a PPT algorithm

Sample(j, i

1

,... ),imxi,...Ti)

such that the

following

holds

for

every permutation

H

on [n]:

D :: JointSample(1 A)

where JointSample is defined in Figure 2-1.

For i in

...

n:

)+-

Sample(H(i), 1(1), 1(2).

. . . , 1(i - 1), x'r (₁)

(

.-. ,Xf((-))

Output (x'i..., x')

Figure 2-1: The sampling procedure JointSamplen(l') defined by the permutation

H. We present the correctness and security definitions for distributed correlated

gen-erators associated with conditionally samplable distributions. Roughly speaking, a

distributed correlation generator associated with a conditionally samplable

distribu-tion D is correct for any session ID and (Si,...,sa) generated by SeedGen if the

distribution

(16)

(x1,... , x,) sampled according to D. In terms of security, we require that an

adver-sary, given a (non-empty) subset S of {s , ... , sn}, should not be able to distinguish

{

InstGen(si, i, ID)}js from {xi}is, where {xj}igs is sampled according to D condi-tioned on the samples that the adversary has access to. If we allow S to be empty, then this definition subsumes the correctness definition, so we present a single definition for both.

Definition 2.1.2 (Correctness and security of DCGs). Let n > 2. Consider a dis-tributed correlation generator DCG = (SeedGen, InstGen) for a conditionally samplable distribution D. The distributed correlation generator is secure if for every probabilis-tic polynomial time adversary A, every set S c [n], the adversary cannot distinguish between the following two experiments.

* In the first experiment, in the setup phase, the adversary receives si for all i E S, where si <- SeedGen(1', 1n). In the sampling phase, the adversary makes queries of the form (i, id), where i e [n] to an oracle that computes InstGen(si, i, id). " In the second experiment, the adversary receives the same seeds si. However, in

the sampling phase, she makes queries to an oracle that computes fresh samples conditioned on the seeds of corrupted parties. That is, for a query (i, id), if i G S, the oracle will answer with InstGen(si, ,id). But if i S, the oracle

will answer with Sample(i, i1, i2, . . , , ₁ , x2,. .., xm), where for every

j

E [m]

either i E S in which case xij = InstGen(si,,ij, id) or ij V S in which case xij

(17)

(18)

Chapter 3

(19)

3.1 Relation to Other Primitives

Distributed correlation generators are closely related to several other primitives. In

this section, we will consider a special class of distributions called "additive-spooky"

distributions which are defined as follows.

Definition 3.1.1 (Additive-spooky distributions). Consider a binary function f on

two inputs and two efficiently samplable distributions X and Y. The additive-spooky

distribution associated with

f,

X, Y is defined by the following sampling procedure:

sample x from X and y from Y, plus a random bit b. The output of the sample is

((x, b), (y, b

e

f (x, y)).

Note that without loss of generality, we may assume X and Y are uniformly

random, because

f

itself can interpret these as random tapes used to efficiently sample

the corresponding distributions.

We will discuss various constructions and implications of distributed correlation

generators for additive-spooky distributions. 2-party additive-spooky DCGs are

im-plied by 2-party function secret sharing (13]), which can be constructed from the DDH

or factoring assumptions (or, as we will show, from single-key private CPRFs, based

on the LWE assumption). In turn, AFS-spooky encryption implies FSS, not only for

2 parties but for any number of parties (see section 6.3 of

[81).

AFS-spooky

encryp-tion can be constructed based on LWE (Theorem 2 of [8]). These relaencryp-tionships are

illustrated below.

LWE AFS-spooky Encryption (2 parties) DDH, Factoring Single-key Function Private CPRF 0. Secret Sharing

(2 parties)

Distributed Correlation Generators (2 parties)

Theorem 3.1.2. Assuming two-party FSS and PRFs exist, there exists a distributed

correlation generator for any two-party additive-spooky distribution.

Proof. Let

f

be a binary function (corresponding to the additive-spooky distribution

of

interest). Consider an FSS scheme

Gen., Eval for

f with the natural output domain

0,1

and output decoder

E

(bit XOR). Let g be a PRF.

Construction: We will construct a distributed correlation generator. The seed

(20)

" SeedGen(1A, 1n): Sample PRF keys

K

1

, K

2

.

Let

h

be defined as

h(id) = f (gK1 (id), K2(id)). Compute FSS keys (ki, k2) = Gen(1A, h). Output si = (Ki, ki).

" InstGen(si, i, id): Output (gKjid), Eval(i, ki, id)).

We now need to analyze the two cases in the security definition (one corrupted party and correctness).

One corrupted party. Without loss of generality, suppose that the adversary cor-rupts the first party. We consider a series of hybrids below.

Hybrid NO: (real scheme) In this hybrid, the adversary receives s1 and oracle

access to (gK2(id), Eval(2, k2,

id)).

Hybrid

N

1: (correct additive-spooky output) In the oracle, we replace the

second party's bit output with the bit b that makes the additive-spooky output cor-rect. In this hybrid, when the oracle is queried on the second party's bit, it returns h(id) (D Eval(1, ki, id).

Claim 1. A cannot distinguish between No and N1.

Proof. By the correctness of FSS, Eval(1, kj, id) D Eval(2, k2, id) = h(id), therefore

Eval(2, k2, id) = h(id) E Eval(1, k1, id). In other words,

No

= N1.

Hybrid N

2

: (fresh key) In the oracle, we sample a fresh key K2 and use it to

compute the second party's input to f. Instead of generating the input as gK2(id), the

oracle replies with gKq(id) (the second party's bit is computed to ensure correctness, as before).

Claim 2. A cannot distinguish between R

1

and N

2

.

Proof. Suppose not. Then we can break the security of FSS. An equivalent description

of this hybrid is that the oracle replies with

gK

2(id), but the first party receives k'

which was computed from KI, K2. Consider two functions h(id)= f(gK,(id), gK2(id))

and

h'(id) =

f(gK, (id),gK' (id)). Our

adversary asks the FSS challenger for a key

corresponding either to h or h', then simulates the oracle using K2 and computing b

as previously described. If the key corresponds to h, then this experiment is hybrid

N

1

;

otherwise, it is

N

2. Therefore, we break the security of FSS, a contradiction.

El

Hybrid N

3

: (random input) In the oracle, we replace the second party's input

to

f

with a random bitstring. Instead of generating the input as

9K2

(id), the oracle

samples a random value.

(21)

Proof. Suppose not. Then we can break the security of the PRF. We can generate K1. K2, ki ourselves, query the challenger for either the PRF or truly random output, and then compute b as before. If we query a PRF, this experiment is W2; otherwise it is W3-

N3 corresponds to a fresh sampling oracle, so this finishes the proof.

Correctness. As described in Definition 2.1.2, this is just the case when neither party is corrupted. We again consider a series of hybrids.

Hybrid NO: (real scheme) In this hybrid, the adversary receives si and oracle access to (gK2 (id), Eval(2, k2, id)).

Hybrid N3: (second party random) We go directly from NO to N3, in which the second party's input to f is random, and her bit is chosen to make the additive-spooky output correct, just as in Hybrid N3 of the previous argument. Since in this case, the

adversary only has oracle access to party 1 instead of K1, k1, this is a strictly weaker

argument than the previous argument from NO to N3.

Hybrid N4: (random bit for first party) In the oracle, we replace the first party's output bit with a random bit.

Claim 4. A cannot distinquish between N3 and N4.

Proof. Suppose not. By Theorem 9 in

[31,

the output share function is a PRF. Therefore, this breaks the security of the PRF, a contradiction.

Hybrid N₅_{: (Random output for first party)}

As in N3, we replace the first party's output with a random one. The argument

is the same as in Hybrid N.

(22)

-I

Chapter 4 Additive-spooky

distributions

(23)

4.1 _{Additive-spooky 2-player distributions}

In this section, we construct distributed correlation generators for a special class

of distributions, additive-spooky distributions, as defined in Definition 3.1.1. We will first construct DCGs for a special case, oblivious transfer, starting from any constrained PRF. As a corollary, we will show an OT protocol that uses amortized 3+o(1) communication per OT. Finally, we extend this construction to a construction for any additive-spooky distribution.

4.1.1 Warmup: oblivious transfer

We note that oblivious transfer is a special case of an additive-spooky distribution. In oblivious transfer, Alice gets bo, bl, while Bob gets c. bc. Consider the additive-spooky distribution associated with the function f which is 0 if c = 1, and bo D b1 otherwise,

and with X and Y being random bits. By a standard transformation, we can take this distribution to the OT distribution.

Theorem 4.1.1. Assuming constrained PRFs exist, there exists a distributed corre-lation generator for the OT distribution.

Proof. Let (KeyGen, Eval, Constrain, ConstrainEval) be a constrained PRF with output length 1. Our construction proceeds as follows.

" The SeedGen algorithm samples a PRF seed a, another PRF seed s and a constrained key o-c, where the circuit C (dependent on s) is defined as follows:

C (ilIy) = 1 if and only if EvaI(s,i) = y. It then outputs the two seeds a and

(OcC, s).

" The InstGen algorithm for the first party, Alice, on session id, outputs

(EvaI (a, idil0), _{EvaI(a, idill)). The InstGen algorithm for the second party, Bob,}

on session id, computes y, = Eval(s, i) and outputs (yi. ConstrainEval(crc, iIIy

))

(y, is the bit determining which bit Bob receives).

For the security proof, we analyze three cases, corresponding to each set of cor-rupted parties in the security definition (the case when all parties are corcor-rupted is vacuously true).

Alice is corrupted. We will contradict the security of the PRF (see Definition

A.0.1) by constructing an adversary A' to win the PRF security game. First, A' samples a PRF key a as the setup would. He then acts as the challenger for A, starting by giving her the PRF key a. For each oracle query i that A issues, A' queries his own oracle on i and receives a bit yi, which is either a uniformly random value, or the value of Eval (s, i) (depending on b2). Then A' responds to A's query

with the response (yi, EvaI-(a, iI10)).

To prove security, we consider A's perspective. If b2 = 1 (i.e. A' receives uniformly

random responses), then the response to A's query is one of her two bits selected at random. This corresponds to the conditionally sampled distribution. On the other

(24)

hand, if b2 = 0 (i.e. A' receives the true value of Eval (s. i)), then the response to A's

query is the same as the real distribution. In other words, A' wins the security game with the same advantage as A, and we are done.

Bob is corrupted. We will contradict the security of the constrained PRF. Our

adversary A' samples a PRF key s, then computes C, as described in the setup. Then

A' specifies a function f(x) = C (x) and receives the seed oc from the challenger. Now A' acts as the challenger, giving s and cc to A. To answer a query i, A' first computes yj = Eval (s, i), and then queries its own oracle on

i

1gi (the constrained

input). A' then responds to the query with (ConstrainEval (ac,

illyi),

O (ac,

illgi))

(in

the appropriate order).

To prove security, we consider A's perspective. If b2 =1 (i.e. A' receives uniformly

random responses on constrained queries), then the response to A's query is the bit that he already knows, and a random bit. This corresponds to the conditionally

sam-pled distribution. On the other hand, if

b2 =

0, then the response to A's query is

(ConstrainEval

(ac, il lyi) , Eval (ac, il lgi)), which is computationally indistinguishable

from the real distribution (since ConstrainEval and Eval are indistinguishable on

un-constrained inputs). In other words, A' wins the security game with negligibly close

to the same advantage as A, and we are done.

Correctness. As described in Definition 2.1.2, this is just the case when neither

party is corrupted. We will proceed with three hybrids.

Hybrid WO: The real game.

Hybrid Ri: We replace Bob's output with a freshly-sampled output, as in the

ar-gument in the case of Alice being corrupted (this is a strictly weaker claim, since the

adversary only has oracle access to Alice's output, not her seed).

Hybrid

-2:

We replace Alice's output with a freshly-sampled output. Security

follows by the PRF security of a.

Now we show how to obtain n OTs with amortized 3+ o(1) bits of communication

per OT, that is, 3n + o(n) total communication, via a standard transformation. From

the random OT, the sender starts with x

0

., x

1

and the receiver starts with r, x,. The

sender also has her actual bits yo, yi and the receiver has s and wants to get y,. The

receiver sends

r

E

s,

and the sender replies with

Yre,

E

Xr

and

ye,

which is

ys.

This

uses 3n bits of communication. However, we still need to instantiate the trusted party.

We can use a 2PC protocol in order to compute SeedGen and initialize the protocol.

Assuming CPRFs with subexponential security, the security parameter can be (say)

logarithmic in the number of sessions, so that the total communication in the 2PC

protocol is at most polylogarithmic (in particular, o(n)).

This improves on the 4n bits of communication from Theorem 11 of [12], at the cost

of providing only indistinguishability security instead of simulation security. This may

also be viewed as an OT extension protocol, since the initial two-party computation

can be instantiated with OTs (although these may need to be simulation-secure).

(25)

Program Ci

Input: id (the session number)

Hardcoded: f, K, Ky, K,..., Ki_1, and K',.

..

,_1

Instructions: On input r, the program Ci does the following.

* Compute b <-

f

(gK, (id) , 9K,(id))

E

(ej=1gKid)

D

((BKi 9K; (id))

" If b

=

0, output 0 (do not constrain). Otherwise, output 1 (constrain).

Figure 4-1: The constraint for the additive-spooky scheme

4.1.2 DCGs for general additive-spooky distributions

In this section we show how to construct a distributed correlation generator for any

additive-spooky distribution using private CPRFs.

The high-level idea of our construction is as follows. Alice and Bob each generate

their corresponding inputs to

f

using independent PRF keys. Alice also receives a

private CPRF key, and Bob receives a derived key with a particular constraint (which

depends on f(x, y) and contains Alice and Bob's hardcoded PRF keys). This allows

Bob to learn something that depends on f(x, y)

-

in particular, he either learns the

same bit as Alice, or a random bit. When we XOR these two bits together, we get

either 0 (if Bob learns the same bit as Alice), or a random bit (otherwise). By adding

more CPRF keys, we can amplify this so that the XOR is either 0, or 1 with high

probability, depending on whether f(x, y) is 0 or 1. However, Alice still learns nothing

(because she does not even see the derived key), and Bob also learns nothing (because

the key hides the constraint).

Theorem 4.1.2. Consider an additive-spooky distribution corresponding to the

func-tion

f.

Let (KeyGen, Eval, Constrain, ConstrainEva\) be a constrained PRF. Then the

following 2-party distributed correlation generator is secure.

" SeedGen(1A): See Figure

4-2.

" InstGen(si, i, id): On session id, Alice outputs (gK, (id) , g

i gKi (id)), while Bob

outputs gKy d and Ki (Id)

Proof. To prove the security of this distributed correlation generator, we must prove

it for every set S

C

{1, 2}. If S

=

{1, 2}, there is nothing to prove. Hence we will

focus on the remaining three cases.

Correctness As described in Definition 2.1.2, this is just the case when neither party

is corrupted. We wish to prove that the output of the protocol, i.e.

(InstGen (2, Ky,

K ,...,K' ), InstGen

(2, Ky,

K',..., K' )),

is indistinguishable from

(26)

-I

* Sample two PRF keys (Kr, K.) with KeyGen. " For i from 1 to m:

9 Compute Ci as in Figure 4-1, which depends on

f,

K, Kv, K1,..., Ki_1

and Kj',..., K_₁ " Ki *- KeyGeno.

" Ki - Constrain(Ki, Ci). Output s, +- (1, Kx, K₁,... , Km) and

s<- (2, Ky, K ... , K').

Figure 4-2: The seed generation for the additive-spooky scheme

the seeds sx and sy, as well as the hiding property of the constrained PRF. We will consider five hybrids, where the first hybrid is the real scheme and the last hybrid is the fresh-sampling oracle.

Hybrid NO: This is the real experiment, in which Alice outputs gKx (id) and Eli gKi (id), and Bob outputs gK, (id) and $ 1g9K(id).

Hybrid N1: We change Bob's bit output to f (gKx(id), 9K,(id)) (( ,1gKjid)), so

that the scheme correctly computes

f

on Alice's and Bob's inputs. Security is by Lemma 4.1.3 below.

Hybrid N2: We change Alice's bit from

G)L

1 gKi (id) to a random bit. Security is by the PRF security of K,.

Hybrid W3: We change Bob's output from gK, (id) to a random bit. Security is by the PRF security of Ky.

Hybrid N4: Similarly, we change Alice's output from 9K, (id) to a random bit. Se-curity is by the PRF seSe-curity of Kx. In this hybrid, Alice outputs a random x and random bit a, Bob outputs a random y and

f

(x, y) E a, which is identical to the fresh-sampling oracle.

We now prove the lemma needed to move to hybrid N1.

Lemma 4.1.3. For any adversary A, A cannot distinguish between:

" Alice outputs gKx(id) and EI 1g Ki (id), Bob outputs gK, (id) and e 1 gKjid)

" Alice outputs g,. (id) and ED" gK, (id), Bob outputs gK, (id) andf (9K (id), 9K,(id))

((Di1

gKi (id ))

(27)

Proof. We claim that the two hybrids are statistically indistinguishable. It suffices to show that with all but negligible probability, gK 9K'(id) f (gi:(id), 9K,(id)) E

(@i₁g Ki (id)). In other words, EIm gK (id) EDK (K()K (id)) with all but negligible probability.

Suppose that

f

(gK, (id), 9K, (id)) = 0. Then, considering the constraint in Figure 4-1 and by the correctness of the constrained PRF, we must have gKi (id) = 9K (id) for every i, so that gK 9K(id) = 0 = f (9K. (id), gK, (id)), and we are done. Otherwise, f (gK (id), 9K (id)) = 1. We claim that for every i, if Cji_1 = 1 (i.e.

the output is constrained), then gKi (id), 9K'(id) are statistically close to independent coin flips with probability 1. First, by the PRF security of g, 9Ki (id) is statistically indistinguishable from a coin flip (this corresponds to guessing b2 in the security

game). Second, if g<,(id) were statistically distinguishable from a coin flip (i.e. bi-ased towards 0 or 1), then an adversary, given the constrained key, could distinguish the constrained inputs from unconstrained inputs (this corresponds to guessing bi). Finally, if gKi (id) and 9K' (id) were positively or negatively correlated, then an ad-versary, given the constrained key, can distinguish the real evaluations from random, violating the constrained PRF security (this corresponds to guessing b2 again).

Since gKi (id), 9K' (id) are statistically close to independent coin flips, the probabil-ity that gKi(id) ( 9K'(id) = 1 is statistically close to 1. If there is any

j

such that

i1

gKi (id) E9K'(id) f (9K, (id), 9K, (id)), then for all

j

>

j,

gKi (id) = 9K'(id) and therefore

1 YK,(id) 9K'(id) ( f (9K. (id),9K,(id)), as desired. So, the only way for this

not to happen is if gKi (id) 9K (id) for every i, which happens only with probability

(statistically close to) 2-m, as desired. El

Alice is corrupted. We now prove that Alice, given her seed (1, K, K1, ... , Km), cannot distinguish Bob's output (i.e. InstGen (2, Ky, K', ... , K' )) from random. This proof is straightforward, because Alice's entire input (sr, K) is generated before Bob's input is determined, and therefore Alice would be distinguishing the output of a PRF from random.

Proof. In the first experiment, Alice gets oracle access to 9K. (id) and @ 9K;(id).

As proved in Lemma 4.1.3, with all but negligible probability, the bit is equal to

f

(9Kn (Id), 9Ky (id)) 9 = 9Ki (id). Therefore, the first experiment is indistinguishable from an experiment in which Alice gets oracle access to 9K, (id) and f (gKx (id), 9K, (id)) 0

Di=1 gKi

(id).

Now suppose towards a contradiction that some adversary A distinguishes this experiment from the second experiment. Then there is an adversary A' that dis-tinguishes between the output of a PRF and random. A' gets oracle access to

either 9K, (id) or a random oracle. He generates Alice's keys himself and gives them to A. To answer queries, he queries the oracle on id to get y, and outputs

(y, f (gKx (id), y)

E

9Am qKj(id)). Note that if y is gKy (id), then this is the experiment

described previously, whereas if y is random, then this is a fresh-sampling oracle (the second experiment). Therefore, A' distinguishes with the same probability as A, a

(28)

Bob is corrupted. We now prove that Bob, given his seed (2, Ky. K', .. ., K'),

can-not distinguish Alice's output (i.e. InstGen (1, K1, K1, ... Kin)) from random (recall

that Alice's and Bob's seeds start with 1 and 2 respectively to indicate which party they are).

Proof. We will consider four hybrids, where the first hybrid is the real scheme and the last hybrid is the fresh-sampling oracle.

Hybrid WO: This is the real experiment, in which Bob gets Ky and K .... , K' ,, and

oracle access to Alice's output, gK.(id) and @ gKi(id).

Hybrid WI: We change Alice's bit output to f (gK.(id), 9K, (id)) ED(il gK'(id), i.e. the output is always correct. This is statistically indistinguishable from the previous hybrid by Lemma 4.1.3.

Hybrid W2: We change Alice's output from gK(id) to 9K'(id), where K' is a PRF key freshly sampled (after the original Kx was already used in computing Bob's keys K... ,K'). The output remains correct, i.e. it is

f

(gK'(id),g9K, (id)) (DED'jK'(id).

The indistinguishability of W, and W2 follows from the security of constrained hiding property of private constrained PRFs: if an adversary A can distinguish these two experiments, then it can distinguish between experiments in which the constraints are based on Kx and K'. respectively, a contradiction.

Hybrid R-;: We change Alice's output from gK (id) to a freshly-sampled x. The output remains correct.

The indistinguishability of W2 and R3 follows from the security of pseudorandom

functions. E

F1

(29)

(30)

Chapter 5 Distributed Correlation Generators

(31)

Program P Input: id

Hardcoded: Puncturable PRF keys K...., Ka, party number i Instructions: On input id, the program P does the following.

" For each

j

from I to n, compute p3 <- Fc (id) and

xj <- Sample(j, 1, 2,...,j - 1, x₁, ... X ) where p3 is the randomness used in

Sample.

" Output xi.

Figure 5-1: The program to be obfuscated

5.1 Distributed Correlation Generators from 10

The high level idea behind the construction is as follows. We can generate an un-bounded stream of random joint samples using a PRF. However, if we gave the PRF seed to everyone (in order for each party to generate the samples), each party would be able to generate the other parties' samples as well. To prevent this, we obfuscate a program that uses the PRF seed to generate the joint sample, and then only outputs the appropriate coordinate.

Let 0 be an indistinguishability obfuscator with subexponential hardness and let F be a puncturable PRF with input length 1. Then consider the following distributed correlation generator, where P is as defined in Figure 5-1.

" SeedGen(1A, 1"): Sample n PRF keys K1, ... , K,. Compute the programs P for

each i from I to n. Output 1.., = 0(P₁),.. ,0 (P).

" InstGen(si, i, id): Output 0(P)(id).

Theorem 5.1.1. Assuming subexponentially secure PPRFs and 10, there is a dis-tributed correlation generator for all functionalities.

Proof. The high level idea behind the proof is as follows. If we knew in advance which parties the adversary would corrupt, we could have them sample their coordinates first, and then have all of the remaining parties sample their coordinates conditioned on the corrupt parties' values (using the corrupt parties' seeds). Since the corrupt parties only know their own randomness, this scheme is trivially secure. By using IO, we can obscure the order in which parties are sampled. Since the distribution is conditionally samplable, the outcome takes the same distribution regardless of the order of sampling.

As described previously, we will argue that the programs P, when obfuscated, conceal the order in which the variables are sampled, via a hybrid argument. For the detailed proof, we refer the reader to Appendix A.1. 0

(32)

Appendix A

Preliminaries and Definitions

We denote the security parameter to be A. We use the notation Do D ,1 to denote

that the two distributions Do and D1 are identically distributed.

A.0.1 Constrained Pseudo-Random Functions

The notion of constrained PRFs was introduced by the works of 16, 1, 141. We present the definition of private CPRFs from [7]. A private CPRF key can be used to derive a key of constrained by a function

f:

on inputs x such that f(x) = 0, the key can

be used to correctly evaluate the PRF, but otherwise, the output on the constrained key does not reveal the output of the PRF on the real key. Furthermore, of does not reveal any information about the function

f.

Definition A.0.

1

(Constrained PRF). A constrained pseudo-random function (PRF)

family is defined by algorithms (KeyGen, Eval, Constrain, ConstrainEval) where:

" KeyGen(1, 1', 1d, 1) is a PPT algorithm that takes as input the security param-eter A, a circuit max-length e, a circuit max-depth d and an output space r, and outputs a PRF key a and public parameters pp.

Dsdf

" EvaI Ip (a, x) is a deterministic algorithm that takes as input a key o- and a string

x E {0, 1}*, and outputs y E Zr;

" Constrainpp(a, f) is a PPT algorithm that takes as input a PRF key a and a

circuit f : {0, 1}* - {0, 1}, and outputs a constrained key of;

" ConstrainEvalpp(cf,x) is a deterministic algorithm that takes as input a

con-strained key of and a string x E {0. 1}*, and outputs either a string y E Z, or

I.

Definition A.0.2. Consider the following game between a PPT adversary A and a

challenger:

(33)

2. The challenger generates (pp, seed) <- Keygen(1A, 1' 1d, 1'). It flips three coins b1, b2, b3 <*- {0, 1}. Intuitively, b1 selects whether fo or fi are used for the

constraint, b2 selects whether a real or random value is returned on queries non-constrained queries, and b3 selects whether the actual or constrained value

is returned on constrained queries.

The challenger creates seedj +- Constraingp(seed, fbi), and sends (pp, seedj) to A.

3. A adaptively sends unique queries x

E

{0, 1}* to the challenger (i.e. no x is

queried more than once). The challenger returns:

SI,

if fo(x)

# fi(x).

U(Z,), if (fo(x) = fi(x) = 1) A (b₂= 1).

Constrain Evalpp(of, x), if (fo(x) = fi(x) = 0) A (b3 = 0).

Eva Ipp (-, x), otherwise.

4.

A sends a guess (i,b').

The advantage of the adversary in this game is Adv[A] = |Pr[b' = bi] - 1/21. A

family of PRFs (KeyGen, Eval, Constrain, ConstrainEval) is a single-key constraint-hiding selective-function constrained PRF if for every PPT adversary A, Adv[A] = negl(A).

We will also make use of regular (non-private) CPRFs, in which case we have the same definition except that there is no b, (the adversary sends only one function f),

and regular PRFs, which are the same definition except without b1 or b3 (there are

no constraints). Finally, we will make use of puncturable PRIs, which are a special case of CPRFs in which the function is 1 (contrained) at only one point.

A.0.2 Indistinguishability Obfuscation

We use the definition of indistinguishability obfuscation as defined in 1101. Informally speaking, indistinguishability obfuscation is a compiler that transforms a circuit into a functionally equivalent circuit such that obfuscations of two functionally equivalent circuits are computationally indistinguishable.

Definition A.0.3 (Indistinguishability Obfuscator). A uniform PPT machine 0 is called an indistinguishability obfuscator (for P/Poly) if the following conditions are satisfied:

" For all security parameters A c N, for all circuits C of size A, for all inputs x,

we have that

Pr[C'(x) = C(x) : C' - O(C)] = 1

" For any nonuniform PPT distinguisher D, there exists a negligible

function

a such that the following holds: For all security parameters A

c

N, for all pairs

(34)

of circuits C0,01 of size A, for all inputs x, we have that if C0o(x) = C1(x) for

all inputs x, then

IPr[D(((Co)) = 1] - Pr[D(O(C1)) = 1]1 < a(A)

A.0.3 Function Secret Sharing

We present the definition of function secret sharing from

131.

A two-party function secret sharing (FSS) scheme with respect to the XOR output decoder and function class P/Poly is a pair of PPT algorithms (Gen, Eval) with the following syntax:

" Gen(1A,

f):

On input the security parameter l and function description f, the key generation algorithm outputs two keys (ki, k2

)-" Eval(i, ki, x): On input a party index i, key ki, and input string x, the evaluation

algorithm outputs a value yi, corresponding to this party's share of f(x). satisfying the following correctness and security requirements:

" Correctness: for all

f,

x,

Pr [(ki, k2) <- Gen(IA, f) : Dec (Eva1(1, ki, x), Eva1(2, k2, x)) = f = 1

" Security: Consider the following indistinguishability challenge experiment for corrupted parties T:

- The adversary outputs (fo,

fi,

state) <- A(1A).

- The challenger samples b <- {0, 1} and (ki, k2) <- Gen(1A, fb)

- The adversary outputs a guess b' <- A((ki)iT, state), given the keys for corrupted T.

Denote by Adv(1A, A) := Pr[b =.b'] - 1/2 as the advantage of A in guessing b in the above experiment, where probability is taken over the randomness of the challenger and of A. We say the scheme (Gen, Eval) is secure if there exists a negligible function v such that for all non-uniform PPT adversaries A, it holds

that Adv(1A, A) < v(A).

A.1 Proof of Theorem 5.1.1

We first restate the 10 security theorem.

Theorem A.1.1. Assuming subexponentially secure PPRFs and 10, there is a dis-tributed correlation generator for all

functionalities.

Proof. The high level idea behind the proof is as follows. If we knew in advance which parties the adversary would corrupt, we could have them sample their coordinates first, and then have all of the remaining parties sample their coordinates conditioned

(35)

Figure A-1: The program with reordered sampling

Program P"

Input: id

Hardcoded: Puncturable PRF keys Kr(1), Kr(2),..., Ki, party number i,

permuta-tion H

Instructions: On input id, the program Pj does the following. " For each

j

from 1 to H-1(i), compute pr(j) <- Fjcl(,)(id) and

xrn() +- Sample(H(j), II(1),

11(2),

... , H(j - 1), xn(),.. ,xn(-1)) where prn(j) is the randomness used in Sample.

" Output Xi.

Figure A-2: The program with only necessary keys

on the corrupt parties' values (using the corrupt parties' seeds). Since the corrupt parties only know their own randomness, this scheme is trivially secure. By using IO, we can obscure the order in which parties are sampled. Since the distribution is conditionally samplable, the outcome takes the same distribution regardless of the order of sampling.

As described previously, we will argue that the programs P, when obfuscated, conceal the order in which the variables are sampled, via a hybrid argument. For the detailed proof, we refer the reader to Appendix A.1. We will consider four hybrids. The first hybrid, 'NO, is the real scheme as previously described. In this scheme, the adversary gets access to O(P) for i E S, and oracle access to O(P) for any i, as shown in Figure 5-1.

Program P,'

Input: id

Hardcoded: Puncturable PRF keys K1, ... , K, party number i, permutation H

Instructions: On input id, the program Pi does the following. " For each

j

from 1 to n, compute P1(j) +- FK,,(j) (id) and

xr(j) +- Sample(H(j), 11(1), H (2), .. .,FI(j - 1), xr(1), - - , j -n1)) where pr(j) is the randomness used in Sample.

(36)

A.1.1 7W: changing the sampling order

In Hybrid 1, we change the order in which the variables are sampled, in order to place all the corrupted parties first, and the honest parties last. The adversary receives

O(P') for i E S, and oracle access to O(P') for any i, where P/ is defined in Figure

A-1.

Claim 5. For any adversary A and set S, A cannot distinguish between 7o and W1.

We will prove this by moving in a sequence of 2' hybrids, one for each session, where the first hybrid corresponds to WO and the last to N₁. In hybrid 1o,r, the

programs are sampled according to Figure A-3.

Claim 6. For any adversary A, set S, and index r, A cannot distinguish between

No,r and

Nor+i-Each of these steps in turn involves hardcoding, switching, and un-hardcoding each program's values for that session. We again use a hybrid argument. In Hybrid

NO,r,k for k < n, the first k programs are sampled according to Figure A-4 (i.e.

hardcoded), where the hardcoded values

v

are the same as would be sampled using

the original order, whereas the rest are sampled according to Figure A-3 (note that

Hybrid Wo,,,k is therefore the same as Hybrid NO,r).

From

Hybrid No,,, to Hybrid

NO,r,n+1, we switch the hardcoded values v from being sampled using the original

order to being sampled using the new order (that places the corrupted parties first).

Finally, in Hybrid 7 _{Or,k for k > n + 1, the first 2n +}

₁

_- _{k programs are hardcoded}

and the rest are not (note that Hybrid NO,r,2n+1 is the same as Hybrid NO,r+1). Thus, to prove Claim 6, it is sufficient to prove Claim 7 below.

Claim 7. For any adversary A, set S, and indices r, k, A cannot distinguish between NO,r,k and

NO,r,k+1-Proof. Let's first consider the case of k < n. Then NO,r,k and 7_Ho,r,k+1 _{differ only in a}

single program which is functionally equivalent. By the security of

10,

the adversary

cannot distinguish these two hybrids.

Now let's consider going from k

=

n to k = n + 1. We claim that both of

these experiments are indistinguishable from the experiment in which v is generated

independently at random (according to 'D). Suppose not; then an adversary can

distinguish oracle access to the punctured points of n

-

ISI PPRFs (which are used

to generate the hardcoded

vs)

from random (which are used to generate the fresh

random vs), contradicting the security of the PPRF. Since both of these experiments

are indistinguishable from random with probability n3, they are indistinguishable

from each other with probability 2n6.

(37)

Program Qj,r

Input: id

Hardcoded: Puncturable PRF keys K1,..., K,,, party number i, permutation H, hybrid number r

Instructions: On input id, the program P does the following. " If id < r:

9 For each

j

from I to n, compute pj - FKj(id) and

xj <- Sample(j, 1,2,...,j - 1,x1, ... , X1) where pj is the randomness used in

Sample.

" If id > r:

* For each

j

from I to H-1(i), compute pr) <-- FKHU)(id) and

rr>j <-- Sample(H(j),

H(1),

H(2),

. . .,

H(j

- 1), rTi>(), .r(j-1)) where pH(j) is

the randomness used in Sample. * Output Xi.

Figure A-3: Switching from one ordering to another

A.1.2 W2: Removing unnecessary keys

In the third hybrid, we give to each program only the keys that are necessary to compute the output. Here the adversary receives O(Pj") for i E S, and oracle access to O(Pj") for any i, where Pi' is defined as in A-2.

Claim 8. For any adversary A and set S, A cannot distinguish between 1 and W2. As in the hybrid step from NO,,,k to NO.r,k+1, we are replacing obfuscations of programs with obfuscations of equivalent programs, so by a hybrid argument we can replace all the programs, or else we contradict the security of the IO scheme.

A.1.3

713:

Fresh-sampling oracle

In the fourth hybrid, we replace the oracle to the real scheme with a fresh-sampling oracle. Here the adversary receives O(Pj") for i

c

S, and oracle access to a fresh-sampling oracle.

Claim 9.

For any adversary A and set S, A cannot distinguish between W2 and N3.

Proof. Suppose not. Note that the adversary A has access to programs which contain

only the PRF keys corresponding to the corrupted parties, and not any PRF keys corresponding to honest parties. We will contradict the security of the PRE. Our adversary B samples the keys Kr(l). Kr(2), ... , Ki himself, produces the obfuscated

(38)

Figure A-4: Hardcoding the value for one session Program Q,.,v

Input: id

Hardcoded: Puncturable PRF keys K1,..., K,, punctured at r, party number i,

permutation 11, hybrid number r

Instructions: On input id, the program P does the following.

" If id < r:

o For each

j

from I to n, compute pj <- FKj (id) and

xj <- Sample(j, 1, 2,...,j - 1, X1,..., xj-1) where pj is the randomness used in

Sample.

" If id = r:

0 xi <- V.

" If

id > r:

9 For each

j

from 1 to I1-(i), compute pr() +- FK,(,)(id) and

xH(j) +- Sample(J(J),

11(1),

1(2),..

.,

H(.j

- 1), xH(1), ... , xn(y-1)) where pil(j) is the randomness used in Sample.

(39)

programs O(P'), and gives them to A. B answers oracle queries by querying its own oracle to compute FKfl,) for the remaining keys, which is either a PRF on randomly chosen keys, or truly random. If it is the former, then A's view corresponds to the first experiment; if the latter, then A's view corresponds to the latter experiment. Thus, B distinguishes between the PRF output and random values with probability

nJ, a contradiction. E

A.1.4 W

4

: Restoring the original programs

We have now replaced the oracle access to the real program with oracle access to a fresh-sampling oracle. However, in the previous hybrid, the adversary receives a different program (P") than the real scheme. Therefore, in the last hybrid, we switch the obfuscated programs back to the real one. Here the adversary receives O(P) for i E S, and oracle access to a fresh-sampling oracle.

Claim 10. For any adversary A and set S, A cannot distinguish between W3 and

This is strictly weaker than going from WO to W3, since if an adversary could distinguish these two experiments, it could also distinguish WO and WN3 by just ignoring

Distributed correlation generators

Distributed Correlation Generators

by

Joseph Hui

Submitted to the Department of Electrical Engineering and Computer

Science

in partial fulfillment of the requirements for the degree of

Master of Science in Electrical Engineering and Computer Science

at the

MASSACHUSETTS INSTITUTE OF TECHNOLOGY

June 2018

@

Massachusetts Institute of Technology 2018. All rights reserved.

Signature redacted

A uthor ...

Department of Electrical Engineering and Computer Science

May 23, 2018

Signature redacted

Certified by.

...

Vinod Vaikuntanathan

Associate Professor of Electrical Engineering and Computer Science

Thesis Supervisor

Accepted by.,

Signature redacted

...

...

/

)

0 slie A. Kolodziejski

Professor of Electrical Engineering and Computer Science

Chair, Department Committee on Graduate Students

ARCHIVES

JUN 18 2018

LIBRARIES

MITLibraries

DISCLAIMER NOTICE

The pagination in this thesis reflects how it was delivered to the

Institute Archives and Special Collections.

The Table of Contents does not accurately represent the

page numbering.

Distributed Correlation Generators

by

Joseph Hui

Abstract

Contents

List of Figures

Chapter 1

Introduction

f

{si}is

,

Connections to FSS and Additive-Spooky Encryption. We start by

f

e

Constructions for Additive-Spooky Distributions. Our main contribution is a

num-ber of random oblivious transfer (OT) correlations using a fixed amount of correlated

randomness. Indeed, in the special case of random OT, we are able to simplify the

construction somewhat, and relax the assumption to just constrained PRFs (without

privacy). Furthermore, building on the standard transformation from random OT to

OT, we show how to achieve n instances of OT with communication 3n + o(n) which

improves on the work of Boyle, Gilboa and Ishai

151

who achieve 4n + o(n), albeit at

the price of achieving the weaker indistinguishability security.

Constructions for General Conditionally Sampleable Distributions. Finally,

we show how to construct DCGs for arbitrary conditionally sampleable distributions

assuming the existence of secure indistinguishability obfuscation schemes. For more

details, we refer the reader to Section 5.1.

Chapter

Distributed Correlation Generators

3l

2.1

Distributed Correlation Generators

We define the notion of distributed correlation generators DCG for a distribution D. It

consists of two probabilistic polynomial time algorithms SeedGen and InstGen defined

below.

* Seed Generator, SeedGen(1A, 1'): On input A, the security parameter and n,

* Instance Generator, InstGen(si.i, id): On input

the seed of the Zth party