Information Flow in Interactive Systems

(1)

HAL Id: inria-00479672

https://hal.inria.fr/inria-00479672v3

Submitted on 1 Nov 2011

HAL

is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire

HAL, est

destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Information Flow in Interactive Systems

Mário Alvim, Miguel Andrés, Catuscia Palamidessi

To cite this version:

Mário Alvim, Miguel Andrés, Catuscia Palamidessi. Information Flow in Interactive Systems. 21th

International Conference on Concurrency Theory (CONCUR 2010), Aug 2010, Paris, France. pp.102-

116, �10.1007/978-3-642-15375-4_8�. �inria-00479672v3�

(2)

Information Flow in Interactive Systems

^?

M´ario S. Alvim¹, Miguel E. Andr´es², and Catuscia Palamidessi¹.

1INRIA and LIX, ´Ecole Polytechnique Palaiseau, France.

2Institute for Computing and Information Sciences, The Netherlands.

Abstract. We consider the problem of defining the information leakage in interactive systems where secrets and observables can alternate during the computation. We show that the information-theoretic approach which interprets such systems as (simple) noisy channels is not valid anymore. However, the principle can be recovered if we consider more complicated types of channels, that in Infor- mation Theory are known as channels with memory and feedback. We show that there is a complete correspondence between interactive systems and such kind of channels. Furthermore, we show that the capacity of the channels associated to such systems is a continuous function of the Kantorovich metric.

1 Introduction

Information leakage refers to the problem that the observable parts of the behavior of a system may reveal information that we would like to keep secret. In recent years, there has been a growing interest in the quantitative aspects of this problem, partly because it is convenient to represent the partial knowledge of the secrets as a probability distribution, and partly because the mechanisms to protect the information may use randomization to obfuscate the relation between the secrets and the observables.

Among the quantitative approaches, some of the most popular ones are based on Information Theory [5, 12, 4, 16]. The system is interpreted as an information-theoretic channel, where the secrets are the input and the observables are the output. The channel matrix is constituted by the conditional probabilities p(b|a), defined as the measure of the executions that give observableb within those which contain the secreta. The leakage is represented by themutual information, and the worst-case leakage by the capacityof the channel.

In the above works, the secret value is assumed to be chosen at the beginning of the computation. In this paper, we are interested in Interactive systems, i.e. systems in which secrets and observables can alternate during the computation, and influence each other. Examples of interactive protocols include auction protocols like [21, 18, 17]. Some of these have become very popular thanks to their integration in Internet- based electronic commerce platforms [9, 10, 14]. As for interactive programs, examples include web servers, GUI applications, and command-line programs [3].

We investigate the applicability of the information-theoretic approach to interactive systems. In [8] it was proposed to define the matrix elementsp(b|a)as the measure of

?This work has been partially supported by the project ANR-09-BLAN-0345 CPP and by the INRIA DRI Equipe Associ´ee PRINTEMPS.

(3)

the traces with (secret, observable)-projection(a, b), divided by the measure of the trace with secret projectiona. This follows the definition of conditional probability in terms of joint and marginal probability. However, it does not define an information-theoretic channel. In fact, by definition a channel should be invariant with respect to the input distribution, and such construction is not, as shown by the following example.

Example 1. Figure 1 represents a web-based interaction between one seller and two possible buyers,richandpoor. The seller offers two different products,cheapandex- pensive, with given probabilities. Once the product is offered, each buyer may try to buy the product, with a certain probability. For simplicity we assume that the buyers offers are exclusive. We assume that the offers are observables, in the sense that they are made public in the website, while the identity of the buyer that actually buys the product should be secret to an external observer. The symbolsr,s,t,r,s,trepresent the probabilities, with the convention thatr= 1−r.

r r

s s t t

cheap expensive

poor rich poor rich

Fig. 1.Inter. System Following [8] we can compute the conditional probabili-

ties asp(b|a) =^p(a,b)_p(a) , thus obtaining the matrix on Table 1.

However, the matrix is not invariant with respect to the input distribution. For instance, if we fixr = r = 0.5and consider two different input distributions, obtained by vary- ing the values of (s,t),we get two different matrices of conditional probabilities, which are represented in Table 2. Hence when the secrets occurafterthe observables we cannot consider the conditional probabilities as representing a (classical)

channel, and we cannot apply the standard information-theoretic concepts. In particular, we cannot adopt the (classical) capacity to represent the worst-case leakage, since the capacity is defined using a fixed channel matrix over all possible input distributions.

cheap expensive poor _rs+rt^rs _rs+rt^rt rich _rs+rt^rs _rs+rt^rt Table 1. Cond. probabilities of Example 1 The first contribution of this paper is to consider an exten-

sion of the theory of channels which makes the information- theoretic approach applicable also the case of interactive systems. It turns out that a richer notion of channels, known in Information Theory aschannels with memory and feedback, serves our purposes. The dependence of inputs on previous outputs corresponds to feedback, and the dependence of outputs on previous inputs and outputs corresponds to memory.

cheap expensive Input dist.

poor ²₅ ³₅ p(poor) =¹₂ rich ³₅ ²₅ p(rich) =¹₂

(a)r=¹₂, s= ²₅, t=³₅

cheap expensive Input dist.

poor ¹₄ ³₄ p(poor) =¹₅ rich ₁₆⁹ ₁₆⁷ p(rich) = ⁴₅

(b)r= ¹₂, s= ₁₀¹, t= ₁₀³ Table 2.Two different channel matrices induced by two different input distributions

(4)

A second contribution of our work is the proof that the

channel capacity is a continuous function of the Kantorovich metric on interactive systems. This was pointed out also in [8], however their construction does not work in our case due to the fact that (as far as we understand) it assumes that the probability of a secret action, in any point of the computation, is not0. This assumption is not guaranteed in our case and therefore we had to proceed differently.

A more complete version of this paper (with proofs) is on line [1].

2 Preliminaries

2.1 Concepts from Information Theory

For more detailed information on this part we refer to [6]. LetA, Bdenote two random variables with corresponding probability distributions pA(·), pB(·), respectively. We shall omit the subscripts when they are clear from the context. LetA={a₁, . . . , a_n},B= {b₁, . . . , b_m}denote, respectively, the sets of possible values forAand forB.

TheentropyofAis defined asH(A) = −P

Ap(a_i) logp(a_i)and it measures the uncertainty ofA. It takes its minimum valueH(A) = 0whenpA(·)is a delta of Dirac.

The maximum valueH(A) = log|A|is obtained whenpA(·)is the uniform distribution. Usually the base of the logarithm is set to be2 and the entropy is measured in bits. Theconditional entropy of A givenB isH(A|B) = −P

Bp(b_i)P

Ap(a_j|b_i) logp(a_j|b_i), and it measures the uncertainty ofAwhenBis known. We can prove that 0 ≤H(A|B)≤H(A). The minimum value,0, is obtained whenAis completely determined byB. The maximum valueH(A)is obtained whenAandBare independent.

Themutual informationbetweenAandBis defined asI(A;B) =H(A)−H(A|B), and it measures the amount of information aboutAthat we gain by observingB. It can be shown thatI(A;B) =I(B;A)and0≤I(A;B)≤H(A).

The entropy and mutual information respect thechain laws. Namely, given a sequence of random variablesA1, A2, . . . , AkandB, we have:

H(A1, A2, . . . , Ak) =

k

X

i=1

H(Ai|A1, . . . , Ai−1) (1)

I(A1, A2, . . . , Ak;B) =

k

X

i=1

I(Ai;B|A1, . . . , A_i−1) (2)

A (discrete memoryless)channelis a tuple(A,B, p(·|·)), where A,B are the sets of input and output symbols, respectively, and p(b_j|a_i) is the probability of observing the output symbolb_j when the input symbol isa_i. An input distributionp(a_i)overA determines, together with the channel, the joint distributionp(a_i, b_j) =p(a_i|b_j)·p(a_i) and consequentlyI(A;B). The maximumI(A;B)over all possible input distributions is the channel’scapacity. Shannon’s famous result states that the capacity coincides with the maximum rate by which information can be transmitted using the channel.

In this paper we consider input and outputsequencesinstead of just symbols.

(5)

Convention 1. LetA={a₁, . . . , a_n}be a finite set ofndifferent symbols (alphabet).

When we have a sequence of symbols (ordered in time), we use a Greek letterα_tto denote the symbol at timet. The notationα^tstands for the sequenceα₁α₂. . . α_t. For instance, in the sequencea₃a₇a₅, we haveα₂=a₇andα²=a₃a₇.

Convention 2. LetXbe a random variable.X^tdenotes the sequence oftconsecutive occurrencesX1, . . . , Xtof the random variableX.

When the channel is used repeatedly, the discrete memoryless channel described above represents the case in which the behavior of the channel at the present time does not depend upon the past history of inputs and outputs. If this assumption does not hold, then we have a channelwith memory. Furthermore, if the outputs from the channel can be fed back to the encoder, thus influencing the generation of the next input symbol, then the channel is said to bewith feedback; otherwise it iswithout feedback.

Equation 3 makes explicit the probabilistic behavior of channels regarding those classifications. Suppose a general channel fromAtoBwith the associated random vari- ablesAfor input andBfor output. Using the notation introduced in Convention 1, the channel behavior afterTuses can be fully described by the joint probabilityp(α^T, β^T).

Using probability laws we derive:

p(α^T, β^T) =

T

Y

t=1

p(α_t|α^t−1, β^t−1)p(β_t|α^t, β^t−1) (by the expansion law) (3)

The first term p(α_t|α^t⁻¹, β^t⁻¹)indicates that the probability of α_t depends not only onα^t⁻¹, but also onβ^t⁻¹(feedback). The second termp(β_t|α^t, β^t⁻¹)indicates that the probability of eachβ_tdepends on previous history of inputsα^tand outputs β^t⁻¹(memory).

If the channel is without feedback, then we have thatp(α_t|α^t−1, β^t−1) =p(α_t|α^t−1), and if the channel is without memory, then we have alsop(β_t|α^t, β^t−1) =p(β_t|α_t).

From these we derivep(β^T|α^T) = QT

t=1p(β_t|α_t), which is the classic equation for discrete memoryless channels without feedback.

Let(V,K)be a Borel space and let(X,BX)and(Y,BY)be Polish spaces equipped with their Borelσ-algebras. Letρ(dx|v)be a family of measures onX givenV. Then ρ(dx|v)is astochastic kernelif and only if and only ifρ(·|v)is a random variable from Vinto the power setP(X).

2.2 Probabilistic automata

A functionµ: S →[0,1]is adiscrete probability distributionon a countable setS if P

s∈Sµ(s) = 1andµ(s)≥0for alls. The set of all discrete probability distributions onSisD(S).

Aprobabilistic automaton[15] is a quadrupleM = (S,L,ˆs, ϑ)whereSis a countable set ofstates,La finite set oflabelsoractions,ˆstheinitialstate, andϑatransition functionϑ:S → ℘f(D(L × S)). Here℘f(X)is the set of all finite subsets ofX. If ϑ(s) =∅thensis aterminalstate. We writes→µforµ∈ϑ(s), s∈ S. Moreover, we

(6)

writes→^`rfors, r∈ Swhenevers→µandµ(`, r)>0. Afully probabilistic automa- tonis a probabilistic automaton satisfying|ϑ(s)| ≤1for all states. Whenϑ(s)6=∅we overload the notation and denoteϑ(s)the distribution outgoing froms.

Apathin a probabilistic automaton is a sequenceσ = s0

`1

→ s1

`2

→ · · · where si ∈ S,`i ∈ Landsi

`i+1

→si+1. A path can befinitein which case it ends with a state.

A path iscomplete if it is either infinite or finite ending in a terminal state. Given a finite pathσ,last(σ)denotes its last state. LetPathss(M)denote the set of all paths, Paths^?s(M)the set of all finite paths, andCPathss(M)the set of all complete paths of an automatonM, starting from the states. We will omitsifs= ˆs. Paths are ordered by the prefix relation, which we denote by≤. Thetraceof a path is the sequence of actions inL^∗∪ L^∞obtained by removing the states, hence for the aboveσwe have trace(σ) = l1l2. . .. IfL⁰ ⊆ L, thentrace_L0(σ)is the projection oftrace(σ)on the elements ofL⁰.

LetM = (S,L,s, ϑ)ˆ be a (fully) probabilistic automaton,s ∈ S a state, and let σ ∈ Paths^?s(M)be a finite path starting in s. Theconegenerated byσis the set of complete pathshσi = {σ⁰ ∈ CPathss(M) | σ ≤ σ⁰}. Given a fully probabilistic automaton M = (S,L,s, ϑ)ˆ and a state s, we can calculate the probability value, denoted byPs(σ), of any finite pathσstarting insas follows:Ps(s) = 1andPs(σ →^` s⁰) =Ps(σ)µ(`, s⁰), wherelast(σ)→µ.

LetΩs ,CPathss(M)be the sample space, and letF^sbe the smallestσ-algebra generated by the cones. ThenPinduces a uniqueprobability measureonF^s(which we will also denote byPs) such thatPs(hσi) =Ps(σ)for every finite pathσstarting in s. Fors= ˆswe writePinstead ofPˆs.

Given a probability space(Ω,F, P)and two eventsA, B∈ FwithP(B)>0, the conditional probabilityofAgivenB,P(A|B), is defined asP(A∩B)/P(B).

3 Discrete channels with memory and feedback

We adopt the model proposed in [19] for discrete channels with memory and feedback.

Such model, represented in Figure 2, can be decomposed in sequential components as follows. At timetthe internal channel’s behavior is represented by the conditional probabilitiesp(β_t|α^t, β^t⁻¹). The internal channel takes the inputα_tand, according to the history of inputs and outputs up to the momentα^t, β^t⁻¹, produces an output symbol β_t. The output is then fed back to the encoder with delay one. On the other side, at time t the encoder takes the message and the past output symbols β^t⁻¹, and produces a channel input symbolα_t. At final timeT the decoder takes all the channel outputsβ^T and produces the decoded messageWˆ. The order is the following:

Message W, α₁, β₁, α₂, β₂, . . . , α_T, β_T, Decoded Message Wˆ Let us describe such channel in more detail. LetAandBbe two finite sets. Let{At}^Tt=1

(channel’s input) and{Bt}^Tt=1(channel’s output) be families of random variables inA andBrespectively. Moreover, letA^T andB^T represent theirT-fold product spaces. A channelis a family of stochastic kernels{p(β_t|α^t, β^t⁻¹)}^Tt=1.

LetF^tbe the set of all measurable mapsϕ_t:B^t⁻¹→ Aendowed with a probability distribution, and letFtbe the corresponding random variable. LetF^T,F^T denote the

(7)

W //

Code- Functions

ϕ^T

ϕ_t

// Encoder {α_t=ϕ_t(β^t⁻¹)}^Tt=1

α_t

// Channel {p(β_t|α^t, β^t⁻¹)}^Tt=1

β_t

//

oo

Decoder

Wˆ =γ(β^T) //Wˆ

Time0 ^β^t−1 Delay

OO

TimeT+ 1 __ __ __ __ __ __ __ __ ___ __ __ __ __

__ __ __ __ __ __ __ __ ___ __ __ __ __ Time

t= 1. . . T

Fig. 2.Model for discrete channel with memory and feedback

Cartesian product on the domain and the random variable, respectively. Achannel code functionis an elementϕ^T = (ϕ₁, . . . , ϕ_T)∈ F^T.

Note that, by probability laws,p(ϕ^T) =QT

t=1p(ϕ_t|ϕ^t−1). Hence the distribution onF^T is uniquely determined by a sequence{p(ϕ_t|ϕ^t−1)}^Tt=1. We will use the notation ϕ^t(β^t−1)to represent theA-valuedt-tuple(ϕ₁, ϕ₂(β¹), . . . , ϕ_t(β^t−1)).

In Information Theory this kind of channels are used to encode and transmit mes- sages. IfWis a message set of cardinalityM with typical elementw, endowed with a probability distribution, achannel codeis a set ofM channel code functionsϕ^T[w], interpreted as follows: for messagew, if at timet the channel feedback isβ^t⁻¹, then the channel encoder outputsϕ_t[w](β^t⁻¹). Achannel decoderis a map fromB^T toW which attempts to reconstruct the input message after observing all the output history β^T from the channel.

3.1 Directed information and capacity of channels with feedback

In classical Information Theory, the channel capacity, which is related to the channel’s transmission rate by Shannon’s fundamental result, can be obtained as the supremum of the mutual information over all possible input’s distributions. In presence of feedback, however, this correspondence does not hold anymore. More specifically, mutual information does not represent any longer the information flow fromα^T toβ^T. Intuitively, this is due to the fact that mutual information expresses correlation, and therefore it is increased by feedback. But the feedback, i.e the way the output influences the next input, is part of the a priori knowledge, and therefore should not be counted when we measure the output’s contribution to the reduction of the uncertainty about the input. If we want to maintain the correspondence with the transmission rate and with information flow, we need to replace mutual information withdirected information[13].

Definition 1. In a channel with feedback, the directed information from inputA^T to output B^T is defined as I(A^T → B^T) = PT

t=1I(α^t;β_t|β^t⁻¹). In the other direction, the directed information from B^T to A^T is defined as: I(B^T → A^T) = PT

t=1I(α_t;β^t⁻¹|α^t⁻¹).

Note that the directed information defined above are not symmetric: the flow from A^T toB^T takes into account the correlation betweenα^tandβ_t, while the flow from

(8)

B^T toA^T is based on the correlation betweenβ^t⁻¹andα_t. Intuitively, this is because α^tinfluencesβ_t, but, in the other direction, it isβ^t⁻¹that influencesα_t.

It can be proved [19] thatI(A^T;B^T) =I(A^T →B^T)+I(B^T →A^T). If a channel does not have feedback, thenI(B^T →A^T) = 0andI(A^T;B^T) =I(A^T →B^T).

In a channel with feedback the information transmitted is the directed information, and not the mutual information. The following example should help understanding why.

Example 2. Consider the discrete memoryless channel with input alphabetA={a₁, a₂} and output alphabetB={b₁, b₂}whose matrix is represented in Table 3.

b₁ b₂ a₁ 0.5 0.5 a₂ 0.5 0.5 Table 3.Channel matrix for Example 2 Suppose that the channel is used with feedback, in such a

way that, for allt’s,α_t+1 =a₁ifβ_t =b₁, andα_t+1 =a₂if β_t= b₂. It is easy to show that ift ≥2thenI(A^t;B^t) 6= 0.

However, there is no leakage from from A^t toB^t, since the rows of the matrix are all equal. We have indeed thatI(A^t→ B^t) = 0, and the mutual informationI(A^t;B^t)is only due to the feedback information flowI(B^t→A^t).

The concept of capacity is generalized for channels with feedback as follows. Let D^T ={{p(α_t|α^t⁻¹, β^t⁻¹)}^Tt=1}be the set of all input distributions. For finiteT, the capacity of a channel{p(β_t|α^t, β^t−1)}^Tt=1is:

CT = sup

DT

1

TI(A^T →B^T) (4)

4 Interactive systems as channels with memory and feedback

(General) Interactive Information Hiding Systems ([2]), are a variant of probabilistic automata in which we separate actions in secret and observable; “interactive” means that secret and observable actions can interleave and influence each other.

Definition 2. A generalIIHSis a quadrupleI = (M,A,B,L^τ), whereM is a probabilistic automaton (S,L,s, ϑ),ˆ L = A ∪ B ∪ L^τ where A, B, and L^τ are pair- wise disjoint sets of secret, observable, and internal actions respectively, andϑ(s)⊆ D(B ∪ L^τ × S)implies |ϑ(s)| ≤ 1, for all s. The condition onϑensures that all observable transitions are fully probabilistic.

AssumptionIn this paper we assume that general IIHSs arenormalized, i.e. once un- folded, all the transitions between two consecutive levels have either secret labels only, or observable labels only. Moreover, the occurrences of secret and observable labels alternate between levels. We will callsecret statesthe states from which only secrets- labeled transitions are possible, and observable statesthe others. Finally, we assume that for everysand`there exists a uniquersuch thats→^` r. Under this assumption we have that the traces of a computation determine the final state, as expressed by the next proposition. In the followingtrace_Aandtrace_Bindicate the projection of the traces on secret and observable actions, respectively. Given a general IIHS, it is always possible to find an equivalent one that satisfies this assumptions. The interested reader can find in [1] the formal definition of the transformation.

(9)

Proposition 1. LetI = (M,A,B,L^τ)be a generalIIHS. Consider two pathsσand σ⁰. Then,trace_A(σ) =trace_A(σ⁰)andtrace_B(σ) =trace_B(σ⁰)impliesσ=σ⁰.

In the following, we will consider two particular cases: thefully probabilisticIIHSs, where there is no nondeterminism, and thesecret -nondeterministicIIHSs, where each secret choice is fully nondeterministic. The latter will be called simply IIHSs.

Definition 3. LetI= ((S,L,s, ϑ),ˆ A,B,L^τ)be a generalIIHS. ThenIis:

– fully probabilistic ifϑ(s)⊆ D(A × S)implies|ϑ(s)| ≤1for eachs∈ S. – secret-nondeterministic ifϑ(s)⊆ D(A × S)implies that for eachs∈ Sthere exist

si’ such thatϑ(s) ={δ(ai, si)}ⁿi=1.

We show now how to construct a channel with memory and feedback from IIHSs.

We will see that an IIHS corresponds precisely to a channel as determined by its stochastic kernel, while a fully probabilistic IIHS determines, additionally, the input distribution. In the following, we consider an IIHSI= ((S,L,s, ϑ),ˆ A,B,L^τ) is innormal- ized form. Given a pathσof length2t−1, we denotetrace_A(σ)byα^t, andtrace_B(σ) byβ^t⁻¹.

Definition 4. For eacht, the channel’s stochastic kernel corresponding toIis defined asp(β_t|α^t, β^t⁻¹) =ϑ(q)(β_t, q⁰), whereqis the state reached from the root via the path σwhose input-trace isα^tand output traceβ^t⁻¹.

Note thatqandq⁰ in previous definitions are well defined: by Proposition 1,qis unique, and since the choice ofβ_tis fully probabilistic,q⁰is also unique.

IfIis fully probabilistic, then it determines also the input distribution and the de- pendency ofα_tuponβ^t⁻¹(feedback) andα^t⁻¹.

Definition 5. IfIis fully probabilistic, the associated channel has a conditional input distribution for eachtdefined asp(α_t|α^t⁻¹, β^t⁻¹) =ϑ(q)(α_t, q⁰), whereqis the state reached from the root via the pathσwhose input-trace isα^t⁻¹and output trace isβ^t⁻¹.

4.1 Lifting the channel inputs to reaction functions

Definitions 4 and 5 define the joint probabilitiesp(α^t, β^t)for a fully probabilistic IIHS.

We still need to show in what sense these define a information-theoretic channel.

The{p(β_t|α^t, β^t−1)}^Tt=1determined by the IIHS correspond to a channel’s stochastic kernel. The problem resides in the conditional probability of{p(α_t|α^t−1, β^t−1)}^Tt=1. In an information-theoretic channel, the value ofα_tis determined in the encoder by a deterministic functionϕ_t(β^t−1). However, inside the encoder there is no possibility for a probabilistic description ofα_t. Furthermore, in our setting the concept of encoder makes no sense as there is no information to encode. A solution to this problem is to externalize the probabilistic behavior ofα_t: the code functions become simplereaction functionsϕ_tthat depend only onβ^t⁻¹(the messagewdoes not play a role any more), and these reaction functions are endowed with a probability distribution that generates the probabilistic behavior of the values ofα_t.

(10)

Definition 6. Areactor is a distribution on reaction functions, i.e., a stochastic kernel{p(ϕ_t|ϕ^t⁻¹)}^Tt=1. A reactorRisconsistent with a fully probabilistic IIHSI if it induces the compatible distributionQ(ϕ^T, α^T, β^T)such that, for every1 ≤ t ≤ T, Q(α_t|α^t⁻¹, β^t⁻¹) = p(α_t|α^t⁻¹, β^t⁻¹), where the latter is the probability distribution induced byI.

The main result of this section states that for any fully probabilistic IIHS there is a reactor that generates the probabilistic behavior of the IIHS.

Theorem 3. Given a fully probabilisticIIHSI, we can construct a channel with memory and feedback, and probability distributionQ(ϕ^T, α^T, β^T), which corresponds toI in the sense that, for everyt,α^tandβ^t, with1≤t≤T,Q(α^t, β^t)^def= P

ϕ^TQ(ϕ^T, α^t, β^t) = p(α^t, β^t)holds, wherep(α^t, β^t)is the joint probability of input and output traces induced byI.

Corollary 1. Let aIbe a fully probabilisticIIHS. Let{p(β_t|α^t, β^t⁻¹)}^Tt=1 be a sequence of stochastic kernels and{p(α_t|α^t⁻¹, β^t⁻¹)}^Tt=1 a sequence of input distributions defined byIaccording to Definitions 4 and 5. Then the reactorR={p(ϕ_t|ϕ^t⁻¹)}^Tt=1

compatible with respect to theIis given by:

p(ϕ₁) =p(α₁|α⁰, β⁰) =p(α₁) (5) p(ϕ_t|ϕ^t⁻¹) = Y

β^t⁻¹

p(ϕ_t(β^t⁻¹)|ϕ^t⁻¹(β^t⁻²), β^t⁻¹), 2≤t≤T (6)

Figure 3 depicts the model for IIHS. Note that, in relation to Figure 2, there are some simplifications: (1) no messagewis needed; (2) the decoder is not used. At the beginning, a reaction function sequence ϕ^T is chosen and then the channel is used T times. At each usaget, the encoder decides the next input symbolα_tbased on the reaction functionϕ_tand the output fed backβ^t⁻¹. Then the channel produces an output β_tbased on the stochastic kernelp(β_t|α^t, β^t⁻¹). The output is then fed back to the encoder with a delay one.

Reaction- Functions

ϕ^T

ϕt

// “Interactor”

{α_t=ϕ_t(β^t−1)}^Tt=1 αt

//

Channel {p(β_t|α^t, β^t−1)}^Tt=1

βt

//

ooDelay

β_t−1

OO

___________________________

Fig. 3.Channel with memory and feedback model for IIHS

We conclude this section by remarking an intriguing coincidence: The notion of reaction function sequenceϕ^T, on the IIHSs, corresponds to the notion of deterministic scheduler. In fact, each reaction functionϕ_tselects the next step,α_t, on the basis of the β^t⁻¹andα^t⁻¹(generated byϕ^t⁻¹), andβ^t⁻¹,α^t⁻¹represent the path until that state.

(11)

5 Leakage in Interactive Systems

In this section we propose a notion of information flow based on our model. We fol- low the idea of defining leakage and maximum leakage using the concepts of mutual information and capacity (see for instance [4]), making the necessary adaptations.

Since the directed informationI(A^T →B^T)is a measure of how much information flows from A^T toB^T in a channel with feedback (cfr. Section 3.1), it is natural to consider it as a measure of leakage of information by the protocol.

Definition 7. The information leakage of anIIHS is defined as: I(A^T → B^T) = PT

t=1H(At|A^t−1, B^t−1)−H(A^T|B^T).

Note thatPT

t=1H(At|A^t−1, B^t−1)can be seen as the entropyHRof reactorR.

Compare this definition with the classical Information-theoretic approach to information leakage: when there is no feedback, the leakage is defined as:

I(A^T;B^T) =H(A^T)−H(A^T|B^T) (7) The principle behind (7) is that the leakage is equal to the difference between thea priori uncertaintyH(A^T)and thea posteriori uncertaintyH(A^T|B^T)(gain in knowledge about the secret by observing the output). Our definition maintains the same principle, with the proviso that the a priori uncertainty is now represented byHR.

5.1 Maximum leakage as capacity

In the case of secret-nondeterministic IIHS, we have a stochastic kernel but no distribution on the code functions. In this case it seems natural to consider the worst leakage over all possible distributions on code functions. This is exactly the concept of capacity.

Definition 8. Themaximum leakageof anIIHSis defined as the capacityCT of the associated channel with memory and feedback.

6 Modeling IIHSs as channels: An example

In this section we show the application of our approach to theCocaine Auction Proto- col[17]. Let us imagine a situation where several mob individuals are gathered around a table. An auction is about to be held in which one of them offers his next shipment of cocaine to the highest bidder. The seller describes the merchandise and proposes a starting price. The others then bid increasing amounts until there are no bids for 30 consecutive seconds. At that point the seller declares the auction closed and arranges a secret appointment with the winner to deliver the goods.

The basic protocol is fairly simple and is organized as a succession of rounds of bidding. Roundistarts with the seller announcing the bid pricebifor that round. Buyers havet seconds to make an offer (i.e. to say yes, meaning “I’m willing to buy at the current bid pricebi”). As soon as one buyer anonymously says yes, he becomes the winnerwiof that round and a new round begins. If nobody says anything fortseconds,

(12)

round i is concluded by timeout and the auction is won by the winnerwi−1 of the previous round, if one exists. If the timeout occurs during round 0, this means that nobody made any offers at the initial priceb0, so there is no sale.

Although our framework allows the forrmalization of this protocol for an arbitrary number of bidders and bidding rounds, for illustration purposes, we will consider the case of two bidders (CandlemakerandScarface) and two rounds of bids. Furthermore, we assume that the initial bid is always1 dollar, so the first bid does not need to be announced by the seller. In each turn the seller can choose how much he wants to increase the actual bid. This is done by adding an increment to the last bid. There are two options of increments, namelyinc1(1 dollar) andinc2(2 dollars). In that way, bi+1 is eitherbi +inc1 or bi+inc2. We can describe this protocol as anormalized IIHSI = (M,A,B,L^τ), whereA={Candlemaker,Scarface, a^∗}is the set of secret actions, B = {inc1, inc2, b_∗} is the set of observable actions, L^τ = ∅ is the set of hidden actions, and the probabilistic automatonM is represented in Figure 4. For clarity reasons, we omit transitions with probability0in the automaton. Note that the special secret actiona_∗ represents the situation where neitherCandlemakernorScarfacebid.

The special observable actionb_∗is only possible after no one has bidden, and signalizes the end of the auction and, therefore, no bid is allowed anymore.

p1Cm

Sf p2

a_∗p3

inc1

q4 inc2

q5 q6inc1 inc2

q7 b_∗

1 pCm9 Sf

p10

a_∗

p11 p12CmSf p13

a_∗

p14 p15CmSf p16

a_∗

p17 p18CmSf p19

a_∗

p20 a_∗ 1 inc1

q22

inc2

q23

inc1

q24

inc2

q25 b_∗ 1

inc1

q27

inc2

q28

inc1

q29

inc2

q30 b_∗ 1

inc1

q32

inc2

q33

inc1

q34

inc2

q35 b_∗ 1

inc1

q37

inc2

q33

inc1

q39

inc2

q40 b_∗ 1 b_∗

1

Fig. 4.Cocaine Auction example

Table 4 shows all the stochastic kernels for this example. The formalization of this protocol in terms of IIHSs using our framework makes it possible to prove the claim in[17] suggesting that if the seller knows the identity of the bidders then the (strong) anonymity guaranties are not provided anymore.

7 Topological properties of IIHSs and their Capacity

In this section we show how to extend to IIHSs the notion of pseudometric defined in [8] for Concurrent Labelled Markov Chains, and we prove that the capacity of the corresponding channels is a continuous function on this pseudometric. The metric construction is sound for general IIHSs, but the result on capacity is only valid for secret- nondeterministic IIHSs.

Given a set of statesS, a pseudometric (or distance) is a functiondthat yields a non-negative real number for each pair of states and satisfies the following:d(s, s) = 0;

(13)

α₁→β₁ inc1 inc2 b∗

Candlemaker q4 q5 0

Scarface q6 q7 0

a^∗ 0 0 1

(a)t= 1, p(β1|α¹, β⁰)

α1, β1, α2→β2 Cheap Expensiveb∗

Candlemaker,inc1,Candlemaker q22 q23 0 Candlemaker,inc1,Scarface q24 q25 0

Candlemaker,inc1,a∗ 0 0 1

Candlemaker,inc2,Candlemaker q27 q28 0 Candlemaker,inc2,Scarface q29 q30 0

Candlemaker,inc2,a∗ 0 0 1

Scarface,inc1,Candlemaker q32 q33 0 Scarface,inc1,Scarface q34 q35 0

Scarface,inc1,a∗ 0 0 1

Scarface,inc2,Candlemaker q37 q38 0 Scarface,inc2,Scarface q39 q40 0

Scarface,inc2,a∗ 0 0 1

a∗,b∗,a∗ 0 0 1

All other lines 0 0 1

(b)t= 2, p(β2|α², β¹) Table 4.Stochastic kernels for the Cocaine Auction example.

d(s, t) = d(t, s), andd(s, t) ≤ d(s, u) +d(u, t). We say that a pseudometricdisc- bounded if ∀s, t : d(s, t) ≤ c, where c is a positive real number. We now define a complete lattice on pseudometrics, and define the distance between IIHSs as the greatest fixpoint of a distance transformation, in line with the coinductive theory of bisimilarity.

Definition 9. Mis the class of1-bounded pseudometrics on states with the ordering dd⁰if∀s, s⁰∈S:d(s, s⁰)≥d⁰(s, s⁰).

It is easy to see that(M,)is a complete lattice. In order to define pseudometrics on IIHSs, we now need to lift the pseudometrics on states to pseudometrics on distributions inD(L ×S). Following standard lines [20, 8, 7], we apply the construction based on the Kantorovich metric [11].

Definition 10. Ford ∈ M, andµ, µ⁰ ∈ D(L ×S), we defined(µ, µ⁰)(overloading the notationd) asd(µ, µ⁰) = maxP

(ì,si)∈L×S(µ(ì, si)−µ⁰(ì, si))xiwhere the maximization is on all possible values of thexi’s, subject to the constraints0≤xi≤1 and xi −xj ≤ d((`ˆ i, si),(`j, sj)), where d((`ˆ i, si),(`j, sj)) = 1if ì 6= `j, and d((`ˆ i, si),(`j, sj)) =d(si, sj)otherwise.

It can be shown that with this definitionmis a pseudometric onD(L ×S).

Definition 11. d∈ Mis abisimulation metricif, for all∈[0,1),d(s, s⁰)≤implies that ifs→µ, then there exists someµ⁰such thats⁰ →µ⁰andd(µ, µ⁰)≤.

The greatest bisimulation metric isdmax =F

{d∈ M |dis a bisimulation metric}. We now characterizedmax as a fixed point of a monotonic functionΦonM. For simplicity, from now on we consider only the distance between states belonging to different IIHSs with disjoint sets of states.

(14)

Definition 12. Given twoIIHSswith transition relationsθandθ⁰ respectively, and a preudometricdon states, defineΦ:M → Mas:

Φ(d)(s, s⁰) =











maxid(si, s⁰_i)if ϑ(s) ={δ(a1,s1), . . . , δ(am,sm)} and ϑ⁰(s⁰) ={δ(a1,s⁰₁), . . . , δ(am,s⁰_m)} d(µ, µ⁰) ifϑ(s) ={µ}andϑ⁰(s⁰) ={µ⁰} 0 ifϑ(s) =ϑ⁰(s⁰) =∅

1 otherwise

It is easy to see that the definition ofΦis a particular case of the functionFdefined in [8, 7]. Hence it can be proved, by adapting the proofs of the analogous results in [8, 7], thatF(d)is a pseudometric, and thatdis a bisimulation metric iffdΦ(d). This implies thatdmax =F

{d∈ M |dΦ(d)}, and still as a particular case ofFin [8, 7], we have thatΦis monotonic onM. By Tarski’s fixed point theorem,dmaxis the greatest fixed point ofΦ. Furthermore, in [1] we show thatdmax is indeed a bisimulation metric, and that it is the greatest bisimulation metric. In addition, the finite branchingness of IIHSs ensures that the closure ordinal ofΦisω(cf. Lemma 3.10 in the full version of [8]). Therefore one can show thatdmax = {Φⁱ(>)|i∈N}, where>is the greatest pseudometric (i.e.>(s, s⁰) = 0for everys, s⁰), andΦ⁰(>) =>.

Given two IIHSsIandI⁰, with initial statessands⁰respectively, we define the distance betweenIandI⁰asd(I,I⁰) =dmax(s, s⁰).Next theorem states the continuity of the capacity w.r.t. the metric on IIHSs. It is crucial that they are secret-nondeterministic (while the definition of the metric holds in general).

Theorem 4. Consider two normalizedIIHSsIandI⁰, and fix aT >0. For every >0 there existsν >0such that if d(I,I⁰)< ν then |CT(I)−CT(I⁰)|< .

We conclude this section with an example showing that the continuity result for the capacity does not hold if the construction of the channel is done starting from a system in which the secrets are endowed with a probability distribution. This is also the reason why we could not simply adopt the proof technique of the continuity result in [8] and we had to come up with a different reasoning.

Example 3. Consider the two following programs, wherea1, a2are secrets,b1,b2are observable,kis the parallel operator, and+pis a binary probabilistic choice that assigns probabilitypto the left branch, and probability1−pto the right one.

s) (send(a1) +p send(a2))kreceive(x).output(b2)

t) (send(a1) +q send(a2))kreceive(x).if x=a1then output(b1)else output(b2).

Table 5 shows the fully probabilistic IIHSs corresponding to these programs, and their associated channels, which in this case (since the secret actions are all at the top- level) are classic channels, i.e. memoryless and without feedback. As usual for classic channels, they do not depend on pand q. It is easy to see that the capacity of the first channel is0and the capacity of the second one is1. Hence their difference is1, independently frompandq.

Let nowp= 0andq= . It is easy to see that the distance betweensandtis. Therefore (when the automata have probabilities on the secrets), the capacity is not a continuous function of the distance.

(15)

s t

p 1−p

0 1 0 1

a1 a2

b1 b2 b1 b2

q 1−q

1 0 0 1

a1 a2

b1 b2 b1 b2

s b1 b2

a1 0 1 a2 0 1

(a)

t b1 b2

a1 1 0 a2 0 1

(b) Table 5.The IIHSs of Example 3 and their corresponding channels

8 Conclusion and future work

In this paper we have investigated the problem of information leakage in interactive systems, and we have proved that these systems can be modeled as channels with memory and feedback. The situation is summarized in Table 6(a). The comparison with the classical situation of non-interactive systems is represented in (b). Furthermore, we have proved that the channel capacity is a continuous function of the kantorovich metric.

IIHSsas automata IIHSsas channels Notion of leakage

Normalized IIHSs with nondeterministic Sequence of stochastic kernels Leakage as capacity inputs and probabilistic outputs {p(βt|α^t, β^t−1)}^Tt=1

Normalized IIHSs with a deterministic Sequence of stochastic kernels scheduler solving the nondeterminism {p(βt|α^t, β^t−1)}^Tt=1+

reaction function seq.ϕ^T

Fully probabilistic normalized IIHSs Sequence of stochastic kernels Leakage as directed {p(βt|α^t, β^t−1)}^Tt=1+ informationI(A^T →B^T) reactor{p(ϕ_t|ϕ^t−1)}^T_t=1

(a)

Classical channels Channels with memory and feedback

The protocol is modeled in independent uses of The protocol is modeled in several the channel, often a unique use. consecutive uses of the channel.

The channel is fromA^T→ B^T, i.e., its input The channel is fromF → B, i.e. its is a single stringα^T =α1. . . αTof secret input is a reaction functionϕtand its symbols and its output is a single stringβ^T= output is an observableβ_t.

β1. . . βTof observable symbols.

The channel is memoryless and in general The channel has memory. Despite the fact that the implicitly it is assumed the absence of channel fromF → Bdoes not have

feedback. feedback, the internal stochastic kernels

do.

The capacity is calculated using information The capacity is calculated using mutual

I(A^T;B^T). directed informationI(A^T →B^T).

(b) Table 6.

For future work we would like to provide algorithms to compute the leakage and maximum leakage of interactive systems. These problems result very challenging given

(16)

the exponential growth of reaction functions (needed to compute the leakage) and the quantification over infinitely many reactors (given by the definition of maximum leakage in terms of capacity). One possible solution is to study the relation between deterministic schedulers and sequence of reaction functions. In particular, we believe that for each sequence of reaction functions and distribution over it there exists a probabilistic scheduler for the automata representation of the secret-nondeterministic IIHS.

In this way, the problem of computing the leakage and maximum leakage would reduce to a standard probabilistic model checking problem (where the challenge is to compute probabilities ranging over infinitely many schedulers).

In addition, we plan to investigate measures of leakage for interactive systems other than mutual information and capacity.

References

1. M. S. Alvim, M. E. Andr´es, and C. Palamidessi. Information Flow in Interactive Systems, 2010. http://hal.archives-ouvertes.fr/inria-00479672/en/.

2. M. E. Andr´es, C. Palamidessi, P. van Rossum, and G. Smith. Computing the leakage of information-hiding systems. In Proc. of TACAS, volume 6015 ofLNCS, pages 373–389.

Springer, 2010.

3. A. Bohannon, B. C. Pierce, V. Sj¨oberg, S. Weirich, and S. Zdancewic. Reactive noninterfer- ence. InProc. of CCS, pages 79–90. ACM, 2009.

4. K. Chatzikokolakis, C. Palamidessi, and P. Panangaden. Anonymity protocols as noisy channels.Inf. and Comp., 206(2–4):378–401, 2008.

5. D. Clark, S. Hunt, and P. Malacaria. Quantified interference for a while language. InProc.

of QAPL 2004, volume 112 ofENTCS, pages 149–166. Elsevier, 2005.

6. T. M. Cover and J. A. Thomas.Elements of Information Theory. J. Wiley & Sons, Inc., 1991.

7. Y. Deng, T. Chothia, C. Palamidessi, and J. Pang. Metrics for action-labelled quantitative transition systems. InProc. of QAPL, volume 153 ofENTCS, pages 79–96. Elsevier, 2006.

8. J. Desharnais, R. Jagadeesan, V. Gupta, and P. Panangaden. The metric analogue of weak bisimulation for probabilistic processes. InProc. of LICS, pages 413–422. IEEE, 2002.

9. Ebay website.http://www.ebay.com/.

10. Ebid website.http://www.ebid.net/.

11. L. Kantorovich. On the transfer of masses (in Russian).Doklady Akademii Nauk, 5(1):1–4, 1942. Translated inManagement Science, 5(1):1–4, 1958.

12. P. Malacaria. Assessing security threats of looping constructs. InProc. of POPL, pages 225–235. ACM, 2007.

13. J. L. Massey. Causality, feedback and directed information. InProc. of the 1990 Intl. Sym- posium on Information Theory and its Applications, 1990.

14. Mercadolibre website.http://www.mercadolibre.com/.

15. R. Segala. Modeling and Verification of Randomized Distributed Real-Time Systems. PhD thesis, 1995. Tech. Rep. MIT/LCS/TR-676.

16. G. Smith. On the foundations of quantitative information flow. InProc. of FOSSACS, volume 5504 ofLNCS, pages 288–302. Springer, 2009.

17. F. Stajano and R. J. Anderson. The cocaine auction protocol: On the power of anonymous broadcast. InInformation Hiding, pages 434–447, 1999.

18. S. Subramanian. Design and verification of a secure electronic auction protocol. InProc. of SRDS, pages 204–210. IEEE, 1998.

19. S. Tatikonda and S. K. Mitter. The capacity of channels with feedback. IEEE Transactions on Information Theory, 55(1):323–349, 2009.

(17)

20. F. van Breugel and J. Worrell. Towards quantitative verification of probabilistic transition systems. InProc. of ICALP, volume 2076 ofLNCS, pages 421–432. Springer, 2001.

21. W. Vickrey. Counterspeculation, Auctions, and Competitive Sealed Tenders.The Journal of Finance, 16(1):8–37, 1961.