On the Additive Capacity Problem for Quantitative Information Flow

(1)

HAL Id: hal-01845330

https://hal.inria.fr/hal-01845330

Submitted on 20 Jul 2018

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

On the Additive Capacity Problem for Quantitative Information Flow

Konstantinos Chatzikokolakis

To cite this version:

Konstantinos Chatzikokolakis. On the Additive Capacity Problem for Quantitative Information Flow.

15th International Conference on Quantitative Evaluation of SysTems (QEST 2018), Sep 2018, Beijing,

China. pp.1-19. �hal-01845330�

(2)

Quantitative Information Flow

Konstantinos Chatzikokolakis CNRS, France

Abstract. Preventing information leakage is a fundamental goal in achieving confidentiality. In many practical scenarios, however, eliminating such leaks is impossible. It becomes then desirable toquantifythe severity of such leaks and establish bounds on the threat they impose. Aiming at developing measures that arerobustwrt a variety of operational conditions, a theory of channelcapacity for theg-leakage model was developed in [1], providing solutions for several scenarios in both the multiplicative and the additive setting.

This paper continuous this line of work by providing substantial improvements over the results of [1] foradditiveleakage. The main idea of employing the Kan- torovich distance remains, but it is now applied toquasimetrics, and in particular the novel“convex-separation” quasimetric. The benefits are threefold: first, it allows to maximize leakage over a larger class of gain functions, most notably including the one of Shannon. Second, a solution is obtained to the problem of maximizing leakage over both priors and gain functions, left open in [1]. Third, it allows to establish an additive variant of the “Miracle” theorem from [3].

Keywords: Quantitative information flow · capacity · Kantorovich distance.

1 Introduction

Preventing sensitive information from being leaked is a fundamental goal of computer security. There are many situations, however, in which completely eliminating such leaks is impossible for a variety of reasons. Sometimes the leak isintentional: wewantto extract knowledge from a statistical database; sometimes it is due toside channelsthat are hard or impossible to fully control; sometimes the leak is in exchange to aservice, as in the case of Location-Based Services; sometimes it is in exchange forefficiency: i.e.

using a weaker but more efficient anonymous communication system.

In these cases, it becomes crucial toquantifysuch leaks, measure how important the threat they pose is and decide whether they can be tolerated or not. This problem is studied in the area ofquantitative information flow, in which much progress has been done in recent years, both from a foundational viewpoint [17,21,11,24,22,12,6,3], but also in the development of counter-measures and verification techniques [4,5,10, 19,27,23,16,8,7,25], and the analysis of real systems [14,20,15,18].

Robustnessis a fundamental theme in this area; we aim at developing measures and bounds that are robust wrt a variety of adversaries and operational scenarios. In the context of the successful g-leakage model, the operational scenario is captured by a gain functiong, and the adversary’s knowledge by a priorπ. Developing the theme of

(3)

robustness in this model, [1] studied the theory ofchannel capacity, that is the problem of maximizing leakage overπfor a fixedg, maximizing overgfor a fixedπ, or maximizing over bothπandg. Comparing the system’s prior and posterior vulnerability can be done eithermultiplicativelyoradditively, leading to a total of six capacity scenarios.

In this paper we make substantial progress in two of the scenarios for additive leakage, namely in maximizing overg alone, or over bothπ, g. When maximizing overg, we quickly realize that if we allow vulnerability to take values in the wholeR≥0, we can alwaysscaleit up, leading to unbounded capacity. In practice, however, it is common to measure vulnerability within a predefined range; for instance, vulnerabilities capturing the probability of some unfortunate event (e.g. Bayes vulnerability) take values in[0,1], while vulnerabilities measuring bits of information (e.g. Shannon vulnerability) take values in [0,log₂|X |]. It is thus natural to restrict to aclassG of gain functions, in which the range of vulnerabilities is limited. In [1], this is achieved by the classG¹Xof 1-spanninggain functions, in which the gain of different secrets varies by at most1.

AlthoughG¹X provides a solution for capacity, this choice is not completely satis- factory from the point of view of robustness, since it excludes important vulnerability functions. Most notably,Shannon vulnerability(the complement of entropy) is notk- spanning foranyk, hence the capacity bound forG¹X does not apply, and indeed the leakage in this case (known asmutual information) doesexceedthe bound. In this paper we take a more permissive approach, by imposing the 1-spanning condition not ong itself, but on the corresponding vulnerability functionVg, leading to the classG^lX. Sinceanyvulnerability isk-spanning for somek, this class does not a priori exclude any type of adversary, it only restricts the range of values.

Solving the capacity problem forG^lX is however not straightforward. It turns out that the core technique from [1], namely the use of theKantorovich distanceon the hyper-distributionproduced by the channel, can still be applied. However, substantial modifications are needed, involving the use ofquasimetrics, and in particular the novel

“convex-separation”quasimetric, replacing the total variation used in [1]. These improvements not only lead to a solution to the problem of maximizing leakage over g:G^lX, but also lead to a solution for the third scenario of maximizing over bothπ, g, as well as to a variant of the “Miracle” theorem for the additive setting.

In detail, the paper makes the following contributions to the study ofg-capacity:

– We present a general technique for computing additive capacity wrt a class of gain functionsG, using the Kantorovich distance over a properly constructed quasimetric.

– This technique is instantiated forG¹X using the total variation metric, recovering the results of [1] in a more structured way.

– The same technique is then instantiated for the larger classG^lX, using the novel

“convex-separation” quasimetric for which an efficient solution is provided.

– The results forG^lX also provide an immediate solution to the scenario of maximizing over bothπ, g, which was left completely open in [1].

– Finally, the results forG^lXlead to an “Additive Miracle” theorem, similar in nature to the “Miracle” theorem of [3] for the multiplicative case.

Acknowledgements All results were obtained in the process of preparing a manuscript on Quantitative Information Flow with my long-term collaborators M. Alvim, C. Morgan, A. McIver, C. Palamidessi and G. Smith, and were heavily influenced by their feedback.

(4)

π

1/3 1/3 1/3

C y1 y2 y3 y4

x1 1 0 0 0 x2 0 ¹/2 1/4 1/4

x3 1/2 1/3 1/6 0

−→

J y1 y2 y3 y4

x1 1/3 0 0 0 x2 0 ¹/6 1/12 1/12

x3 1/6 1/9 1/18 0

−→

[πŻC]¹/2 5/12 1/12

x1 2/3 0 0 x2 0 ³/5 1 x3 1/3 2/5 0

Fig. 1. A priorπ(typeDX), a channelC(rows have typeDY), a jointJ(typeD(X × Y)), and a hyper[πŻC](typeD²X, each column has typeDX).

2 Preliminaries

Channels and their effect on the adversary’s knowledge AchannelC is a simple probabilistic model describing the behavior of a system that takes input values from a finite setX (the secrets) and produces outputs from a finite setY (the observations).

Formally, it is astochastic|X | × |Y|matrix, meaning that elements are non-negative and rows sum to1.Cx,y can be thought of as the conditional probability of producing ywhen the input isx.

We denote byDAthe set of all discrete distributions onA, and by[a]:DAthepoint distribution, assigning probability1toa:A. GivenCand a distributionπ:DX, called theprior, we can create ajointdistributionJ:D(X × Y)asJx,y =πxCx,y. WhenJis understood, it is often written in the usual notationp(x, y), in which case the conditional probabilitiesp(y|x) =^p(x,y)/p(x)coincide withC_x,y (whenp(x)is non-zero) and the x-marginalsp(x) =P

yp(x, y)coincide withπ_x.

The priorπcan be thought of as theinitial knowledgethat the adversary has about the secret. When secrets are passwords, for instance, she might know that some are more likely to be chosen than others. Always assuming thatC is known to the adversary, each outputy provides evidence that allows her to update her knowledge, creating a posteriordistributionδ^y, defined asδ^y_x =^p(x,y)/p(y). This, of course, can be done for each output; everyy:Y potentially provides information to the adversary leading to an updated probabilistic knowledgeδ^y. But not all outputs have the same status; each happens with a different marginal probabilityp(y) =P

xp(x, y), denoted byay. Hence, theeffect of a channelCto the adversary’s prior knowledgeπ, is to produce a set of posteriorsδ^y, each with probabilityay. It is conceptually useful to view this outcome as a singledistribution on distributions, called ahyper-distribution or just hyper.

Such a hyper has typeD²Xand is denoted by[πŻC] =P

yay[δ^y].1Theay-component of [πŻC] is called the outer distribution, expressing the probability of obtaining the posteriorsδ^y, called theinnerdistributions.

An example of all constructions is given in Fig. 1. From a channel C and the uniform priorπwe can construct the jointJby multiplying (element-wise) each column ofCbyπ.Jis then a single distribution assigning probabilities to each pair(x, y). To construct the hyper[πŻC], we normalize (i.e. divide byp(y)) each columnJ_−,y, forming the posteriorδ^y. The marginals p(y)become the outer probabilities ay, labeling the columns of[πŻC]. Finally, note that[πŻC]no longer records the original labelyof each column. As a consequence, the columnsy2andy3, both producing thesame posterior 1 The notation is due to the fact that distributions can be convexly combined (DAis a vector

space).P

yay[δ^y]is exactly the hyper assigning probabilityayto eachδ^y.

(5)

δ^y² = δ^y³ = (0,³/5,²/5), aremerged in [πŻC], which assigns to that posterior the combined probabilityp(y₂) +p(y₃) =⁵/12. This phenomenon happens automatically by the construction of the hyper.

Vulnerability and leakage A fundamental notion in measuring the information leakage of a system is that of avulnerability functionV :DX →R≥0. The goal ofV(π)is to measure how vulnerable a system is when the adversary has knowledgeπabout the secret.

To create a suitable vulnerability function we need to consider the operational scenario at hand: we first determine what the adversary is trying to achieve, then takeV(π)as a measure of how successful the adversary is in that goal. Clearly, no single function can capture all operational scenarios; as a consequence a variety of vulnerability functions has been proposed in the literature, each having a different operational interpretation.

For instance,Bayes-vulnerability V_B(π):= max_xπ_x measures the probability of success of an adversary who tries to guess the complete secret in one try; Shannon- vulnerabilityV_H(π):= log₂|X |+P

xπ_xlog₂π_x(the complement of entropy) measures the expected number of Boolean questions needed to reveal the secret; and Guessing- vulnerabilityV_G(π) =^{|X |+1}/2−P

iiπ_i(where thei-indexing ofXis in non-decreasing probability order) measures the expected numbers of tries to guess the secret correctly.

To study vulnerability in a unifying way, theg-leakage framework was introduced in [3], in which the operating scenario is parametrized by a (possibly infinite) set of actionsW, and again functiong:W × X →R. Intuitively,Wconsists of actions that the adversary can perform toexploithis knowledge about the system. Then,g(w, x) models the adversary’s reward when he performs the actionwand the actual secret isx. In such an operational scenario, it is natural to defineg-vulnerabilityVgas the expected gain of a rational adversary who chooses the best available action:

V_g(π) := sup_wP

xπ_xg(w, x).

Theg-leakage framework is quite expressive, allowing to obtain a variety of vulnerability functions as special cases for suitable choices ofg. For instance, by pickingW =X and theidentitygain function given byg_id(w, x) = 1 iffw= xand0otherwise, we get Vg_id = VB. Similarly, we can construct gain functions expressing Shannon (for which an infiniteW is needed) and Guessing vulnerabilities, as well as a variety of other operational scenarios. In fact, it can be shown that any continuous and convex vulnerabilityV :DX →R≥0can be written asVgfor someg[2].

For expressiveness, it is crucial to allowgto potentially take negative values, andW to be infinite. However, it is desirable thatVgitself be non-negative and finite-valued, since it is meant to express vulnerability. As a consequence we always restrict to the class ofGX of gain functions, defined as those such thatVg:DX →R≥0. As already discussed in the introduction, it is often desirable to further restrict to subsets ofGX.

Having established a way to measure vulnerability in the prior case, we move on to measuring how vulnerable our system is after observing the output of a channelC. Viewing the outcome ofConπas the hyper[πŻC], there is a natural answer: we can measure the vulnerability of each posterior (inner) distribution of[πŻC], then average by the outer probabilities, leading to the following definition ofposteriorg-vulnerability:

Vg[πŻC] := P

yayVg(δ^y) where [πŻC] = P

yay[δ^y]

(6)

Finally, informationleakageis measured by comparing the vulnerability in the prior and posterior case. Depending on how we compare the two vulnerabilities, this leads to theadditiveormultiplicativeleakage, defined as:

L⁺_g(π, C) := V_g[πŻC]−V_g(π), L^×_g(π, C) := V_g[πŻC]

Vg(π) .

Additiveg-capacities A fundamental theme when measuring information leakage is robustness; we need bounds that are robust wrt a variety of different adversaries and operational scenarios. Following this theme, since theg-leakage of a channelCdepends on both the priorπand the gain functiong, it is natural to ask what is themaximum leakageofC, over aclassof gain functionsG ⊆GX and a class of priorsD ⊆DX. This maximum leakage is known as thecapacityofC.

Definition 1. The additive(G,D)-capacityofC, forG ⊆GX,D ⊆DX, is ML⁺_G(D, C) := sup

g:G,π:D

L⁺_g(π, C).

For brevity, when maximizingonlyoverπfor afixedg, we writeML⁺_g(D, C)instead ofML⁺_{g}(D, C); similarly, whenπis fixed we writeML⁺_G(π, C); for specific classes, say G = GX,D = DX, we writeML⁺

G(D, C)instead ofML⁺

GX(DX, C). We can maximize overπ, org, or both, getting three scenarios for additive capacity. The multiplicative capacityML^×_G(D, C), defined similarly, is outside the scope of this paper; the corresponding three scenarios are studied in [1].

For the first scenario,gis fixed and we maximize over the wholeDX. For some gain functions an efficient solution exists; for instance, forgHgiving Shannon vulnerability, ML⁺_g_H(D, C) is the Shannon capacity (maximum transmission rate)2 which can be computed using the well-known Blahut-Arimoto algorithm [13]. However, forg_id(giving Bayes vulnerability), boundingML⁺_g_id(D, C)is known to be NP-complete [1], which of course leaves no hope for a general solution.

The second scenario (fixedπ, maximize overg) is the main focus of this paper and is studied in detail in §3. Our solution turns out to also provide an answer for the third scenario (maximize over bothπ, g), discussed in §3.5.

3 Computing additive capacities

In this section we study the problem of computing the additive (G, π)-capacity. We quickly realize, however, that the unrestricted maximization over the wholeGX yields unbounded leakage. The problem is theunbounded rangeofVg, and can be illustrated by “scaling”g. Define the scaling ofgbyk>0asg_×k(w, x) =k·g(w, x). It is easy to show [1] that this operation gives leakageL⁺_g_×k(π, C) =k· L⁺_g(π, C), and sincekcan be arbitrary we get thatML⁺

G(π, C) = +∞.3

2Which is why we generally refer to the maximization of leakage as “capacity”.

3The same phenomenon happens formultiplicativeleakage, this time demonstrated by shifting.

To keepML^×_G(π, C)bounded we can restrict to the classG⁺Xofnon-negativegain functions.

(7)

There are important classes of gain functions, however, which effectivelylimit the range ofV_g, causing the additive leakage to remain bounded. Even whenML⁺_G(π, C) is finite, computing it efficiently is non-trivial. A solution can be obtained by exploiting the fact thatML⁺_G(π, C)is connected to the well-knownKantorovich distancebetween [π]and[πŻC].

This section proceeds as follows. In §3.1we recall the Kantorovich distance and then use it in §3.2to obtain a generic technique for computingML⁺_G(π, C), in time linear on the size ofC, using properties ofG. We apply these bounds to obtain efficient solutions for the classG¹Xof 1-spanning gain functions in §3.3, as well as the classG^lXof gain functions giving 1-spanning vulnerability in §3.4. Finally, §3.5discusses the scenario of maximizing over bothπandg.

3.1 The Kantorovich and Wasserstein distances

We begin by recalling the Kantorovich distance between probability distributions. Given α:DAand random variableF:A→Rwe writeEαF for the expected value ofF over α, orE_x∼αF(x)to make precise the variable we are averaging over. Observe that for a point distribution centered ata∈ Awe haveE_[a]F=F(a).

A functiond:A²→Ris called aquasimetriciff it satisfies the triangle inequality andd(a, a⁰) = 0∧d(a⁰, a) = 0iffa=a⁰. Ifdis also symmetric it is called ametric. The set of all quasimetrics onAis denoted byMA. Although less frequently used than metrics, quasimetrics will play an important role in computing additive capacity in §3.4.

A natural quasimetric onRis given by

d^<_R(x, y) := max{y−x,0}.

Intuitively, d^<_R(x, y) measures “how much smaller” than y is x; 0 means that x is no smaller than y. This quasimetric can be extended to x, y ∈ Rⁿ as d^<_Rn(x, y) = P

id^<

R(x_i, y_i), giving an “asymmetric Manhattan” distance.

A functionF:A→Ris calledd, d_T-Lipschitz iff

dT(F(a), F(a⁰)) ≤ d(a, a⁰) for alla, a⁰∈ A. (1) The set of all such functions (also calledcontractions) is denoted byC^d,d^TA. For the source metric, a scaled distancek·d(for somek≥0) is often used. For the target metric d_T the Euclidean distanced_Ris commonly employed (in which case we might simply writed-Lipschitz for brevity). In this section, however, we consider functions that are d, d^<

R-Lipschitz, which holds iff

F(a⁰)−F(a) ≤ d(a, a⁰) for alla, a⁰ ∈ A. (2) Note that themaxfrom the definition ofd^<

Ris not needed, sinced(a, a⁰)is non-negative.

The Kantorovich construction allows us to lift a metric d on A to a metric on probability distributions over A. The standard construction is done by maximizing

|E_αF−E_αF⁰|over all functions that ared, d_R-Lipschitz. Note that the Euclidean distance is implicitly used twice in this construction: first, in the Lipschitz condition and second, for comparingEαFandEαF⁰. We can, however, define variants of the Kantorovich by using any other distance onR. Our purpose is to work with quasimetrics, hence we employd^<

R, leading to the following definition.

(8)

Definition 2. The Kantorovich quasimetric is the mappingK^<:MA →MDA:

K^<(d)(α, α⁰) := sup

F:C^d,d

<

RA

d^<_R(EαF,Eα⁰F) = sup

F:C^d,d

<

RA

Eα⁰F− EαF . Note that thatmaxwas again dropped fromd^<_R, since thesupis anyway non-negative (Eα⁰F− Eα⁰F = 0forF constant).

An important property of the Kantorovich distance is that it has a dual formulation as the Wasserstein (or “earth-moving”) metric, for which efficient algorithms exist. Earth moving measures measuring the cost of transforming one distribution into another, using the underlying distancedas the cost function. Given two distributionsα, α⁰:DA(the

“source” and “target”), an earth-movingstrategyis a joint distributionS ∈DA²whose two marginals areαandα⁰. We writeS_α,α⁰for the set of such strategies. The Wasserstein distance is then defined as the minimum transportation cost; similarly to Kantorovich, it provides a lifting fromMAtoMDA.

Definition 3. The Wasserstein distance is the mappingW:MA →MDAgiven by:

W(d)(α, α⁰) := infS:S_α,α0ESd .

The well-known Kantorovich-Rubinstein theorem [26] states that, if (d,A) is a separablemetric space thenK(d) = W(d). For our purposes, we will use this result in the restricted case where one of the two distributions is apointdistribution[a]. This restriction is useful for two reasons: first, it allows us to give a simplified proof, adjusted to ourK^<variant, drop the assumption of separability and allowdto be a quasimetric.

Second, we show that thatW(d)([a], α)can be easily obtained as the expected (wrtα) distance betweenaand the elements in the support ofα.

We first fix some notation: givend∈MAanda∈ A, we denote byd_a :A →R the function “currying”a, defined byd_a(x) =d(a, x). Note thatd_a isd, d^<

R-Lipschitz sinced_a(y)−d_a(x) =d(a, y)−d(a, x)≤d(x, y)follows from the triangle inequality.

We are now ready to state the result relating the two distances.

Theorem 1. Letd∈MAbe any quasimetric. For all[a], α∈DAit holds that K^<(d)([a], α) = W(d)([a], α) = Eαda.

Proof. We start with the Wasserstein distance. The crucial observation is that for point [a], there is informally a single source “pile of earth”: all the probability has to come from a. As a consequence, S_[a],α contains a unique strategy Sx,y = [a]x·αy with independent marginals[a]andα. We can then calculate

W(d)([a], α)

= infS:S_[a],αESd “definition ofW”

= E_(x,y)∼S d(x, y) “take uniqueSwithindependentmarginals[a], α”

= Ey∼αE_x∈[a]d(x, y) “independence of marginals”

= Ey∼αd(a, y) “expectation over point distribution”

= Eαda

For the Kantorovich distance, we bound it from above byEαda, as follows:

(9)

K^<(d)([a], α)

= sup_F_:

C^d,d

<

RAE_αF− E_[a]F “definition ofK^<”

= sup_F_:

C^d,d

<

RAEαF−F(a) “expectation over point distribution”

= sup_F_:

C^d,d

<

RAE_x∼α(F(x)−F(a))

≤ sup_F_:

C^d,d

<

RAE_x∼αd(a, x) “(2),Fisd, d^<_R-Lipschitz”

= E_αd_a

Finally we boundK^<(d)([a], α)from below byEαd_a, as follows:

K^<(d)([a], α)

= sup_F_:

C^d,d

<

RAEαF− E[a]F “definition ofK^<”

≥ E_αd_a− E_[a]d_a _“da∈C^d,d

<

RA”

= Eαda “E[a]da=da(a) = 0” 2

3.2 Computing additive(G, π)-capacity

We now discuss a generic technique for computing the additive(G, π)-capacity, for a given family of gain functionsG ⊆ GX, using the Kantorovich distance. Recall that ML⁺_G(π, C)(Def.1) is defined as the maximum difference between the posterior and prior vulnerabilitiesVg[πŻC], Vg[π]. The latter are simply the expected value ofVgover the distributions[π],[πŻC], which arehyperdistributions, having sample spaceDX.

We start by takingA=DXas the underlying space of the Kantorovich construction.

A quasimetricd∈MDX measures the distance between two distributions on secrets.

The key for boundingML⁺_G(π, C)from above is to find such a quasimetricdwrt which V_gis Lipschitzfor allg ∈ G. Since the Kantorovich distance maximizesE_α⁰F− E_αF overallLipschitz functionsF, it will provide an upper bound to the additive capacity.

Bounding the capacity from below is also possible if there exists someg∈ Gsuch thatdπ=Vg. This is due to the fact that theg-leakage for thisgis exactlyE_[πŻC]dπ.

In the following, given a class of gain function G ⊆ GX, we denote by V_G = {Vg | g ∈ G} the set of g-vulnerabilities induced byG. The bounding technique is formalized in the following result.

Theorem 2. Letd∈MDX, letG ⊆GX and fix a channelCand priorπ. Then dπ ∈ VG implies ML⁺_G(π, C) ≥ k, and (3) C^d,d

<

RDX ⊇ V_G implies ML⁺_G(π, C) ≤ k, (4)

wherek=K^<(d)([π],[πŻC]) =W(d)([π],[πŻC]) =E[πŻC]dπ.

Proof. The fact thatK^<(d)([π],[πŻC]) = W(d)([π],[πŻC]) = E_[πŻC]dπ comes from Thm.1, forA=DX,a=π,α= [πŻC]. We start with (3): forVg =dπwe have that Vg(π) = 0andVg[πŻC] =E[πŻC]dπhenceML⁺_G(π, C)≥ L⁺_g(π, C) =E[πŻC]dπ. For (4) we have that

ML⁺_G(π, C)

= sup_V_g_:_V_G E_[πŻC]Vg− E_[π]Vg “definition”

≤ sup_F_:

C^d,d

<

RDXE_[πŻC]F − E_[π]F “supover larger class”

= K^<(d)([π],[πŻC]) “definition” 2

(10)

So far we have considered an unknown quasimetricdon probability distributions, and identified in Thm.2two properties ofdthat provide bounds for additive leakage. It is not clear, however, whether such a quasimetric exists and what is its relationship with the classG. We now show that the choice ofdis in fact canonical for each class. More precisely, for anyGwe can construct a quasimetricd^<_G satisfying the second condition of Thm.2. Furthermore, if a quasimetricdsatisfyingbothconditions (for anyπ) does exist, then it isuniqueand equal tod^<_G.

Theorem 3. LetG ⊆GX and define a quasimetricd^<_G:MDXas d^<_G(π, σ) := sup

g∈G

d^<_R(V_g(π), V_g(σ)). ThenV_G ⊆C^d

<

G,d^<

RDX. Moreover, if{dπ |π:DX } ⊆V_G ⊆C^d,d

<

RDX holds for some quasimetricd, thend=d^<_G.

Proof. Letg∈ G. We trivially have that d^<_R(Vg(π), Vg(σ)) ≤ sup

g∈G

d^<_R(Vg(π), Vg(σ)) = d^<_G(π, σ), hence Vg is d^<_G, d^<

R-Lipschitz. Now let d:MDX such that {dπ | π:DX } ⊆ V_G ⊆ C^d,d

<

RDXand letπ, σ:DX. Fromdπ∈V_G, we get that d(π, σ) = d^<_R(d_π(π), d_π(σ)) ≤ sup

g∈G

d^<_R(V_g(π), V_g(σ)) = d^<_G(π, σ). Then, sinceVgisd, d^<

R-Lipschitz for allg∈ G, we get that d^<_G(π, σ) = sup

g∈G

d^<_R(V_g(π), V_g(σ)) ≤ sup

g∈G

d(π, σ) = d(π, σ),

hencedandd^<_G coincide. 2

Finally, an important corollary of this technique is that, assuming that dcan be computed in timeO(|X |),ML⁺_G(π, C)can be computed in timeO(|X ||Y|). Indeed, calculatingE[πŻC]dπinvolves computing the output and posterior distributions of[πŻC]. The former can be computed inO(|X ||Y|)time via the joint matrixJ; then for each posteriorδ^y, we need to constructδ^y(O(|X |)) and computed(π, δ^y)(O(|X |)).

3.3 Additive capacity for1-spanning gain functions

We are now ready to provide a complete method for computing additive capacity for an important family of gain functions. Thespanof a functionf:A→Ris defined as

kfk := sup_a,a0∈A|f(a)−f(a⁰)|,

while for gain functions (having two arguments) we definekgk:= sup_w∈Wkg(w,·)k. Since scalinggcauses unbounded leakage, a natural solution is tolimit the range ofg. This can be done in an elegant way, without completely fixingg’s range, by requiring

(11)

thatkgk ≤1, which brings us to the classG¹X of 1-spanning gain functions, the topic of study in this section. In the following we see that this restriction in fact limits the steepnessofV_g.

Note that any result for G¹X can be straightforwardly extended to k-spanning gain functions, sinceL⁺_g_×k(π, C) =k· L⁺_g(π, C)implies that thek-spanning additive capacity is simplyk· ML⁺

G¹(π, C). Note also that this class is quite large: anygwith afinitenumber of actions has finite span. For an infinite number of actions, however, it is possible thatkgk= +∞, i.e. thatgis notk-spanning for anyk; important functions such as Shannon vulnerability fall in this category. In §3.4we enlarge our class of gain functions to include such cases.

Total variation, steepness andg’s span To apply the bounding technique of §3.2to G¹X we need a (quasi)metricd∈MDX with respect to whichVgis Lipschitz wheng is 1-spanning. Conveniently, this is the case of the well-knowntotal variationdistance:

tv(π, π⁰) := sup_X⊆X|π(X)−π⁰(X)|.

For discrete distributions, expressed as vectors, the total variation is equal tod^<_Rn(π, π⁰), which is in fact symmetric when restricted to probability distributions (because the elements sum up to1), and equal to¹/2the Manhattan distancekπ−π⁰k1. Total variation is a natural choice forDXwhenXis an “unstructured” space with no underlying metric.

Indeed, such spaces can be naturally equipped with thediscretemetricdm:MX, defined as dm(x, x⁰) = 0iff x= x⁰ and1 otherwise. It is well known that the Kantorovich lifting of this metric gives total variation, namelytv =K(dm). Note, however, that the fact thattvis theresultof Kantorovich is not important for our goals; our technique involves applyingK^<totvitself, lifting it to hyper-distributions.

The Lipschitz property wrttvand the standard Euclideand_Rnaturally expresses the steepness ofVg. IfVgisk·tv-Lipschitz then the vulnerability can be modified by at mostk·while changing the probability of any subset of secrets by. The largerkis, the steeperVgcan be, i.e. the faster it is allowed to change. It turns out that this property is tightly connected to the span ofg, as the following result from [1] states.

Proposition 1. For allg:GXit holds thatVgiskgk·tv-Lipschitz.

As a consequence of the above result we get thatkV_gk ≤ kgk, since|V_g(π)−V_g(π⁰)|

can be no greater thankgk·tv(π, π⁰)≤ kgk.

Putting the pieces together We can finally recover (in a more structured way) the result of [1] for computing the additive(G¹,π)-capacity. Denote by1_S(x)the indicator function, equal to1ifx∈Sand0otherwise.

Theorem 4. Given a channelCand priorπ, it holds that ML⁺

G¹(π, C) = E[πŻC]tv_π. The capacity is realized by the gain functiong:G¹Xdefined by

W := 2^X , g(W, x) := 1W(x)−π(W),

(12)

for which it holds thatV_g= tv_π.4

Proof. The result comes from Thm.2ford= tvandG=G¹X; we show here that the two conditions of this theorem hold.

For the upper bound we need to show thatg:G¹Ximplies thatV_gistv, d^<

R-Lipschitz.

But from Prop.1we know thatV_gistv, d_R-Lipschitz, which is astrongerproperty since d^<

Ris no greater thand_Rfor all reals.

For the lower bound, we need to show that the claimed gain functiongis1-spanning andVg(τ) = tvπ(τ). Note thatgclearly depends on the fixedπ. For the 1-spanning part we have that|g(W, x)−g(W, x⁰)|=|1W(x)−1W(x⁰)| ≤1. Moreover, it holds that

V_g(τ)

= max_WEx∼τg(w, x) “definition ofVg”

= max_W Ex∼τ1_W(x)−π(W)

“definition ofg”

= max_WP

x∈W(τ_x−πx)

= P

xd^<

R(π_x, τ_x) _“takeW ={x|τx≥πx}”

= tv(π, τ). _“tv(π, π⁰) =d^<_Rn(π, π⁰)” 2 In the above proof we showed thattvsatisfies the two conditions of Thm.2. From Thm.3we known that there is a unique quasimetric satisfying these conditions which can be constructed explicitly from the classG=G¹X, that is:tv(π, π⁰) =d^<

G¹(π, σ) = sup_g:_G1XV_g(σ)−V_g(π). Note also that tvcan be computed in|X | time, hence, as discussed in §3.2,ML⁺

G¹(π, C)can be computed in timeO(|X ||Y|). 3.4 Additive capacity for1-spanning vulnerability functions

As discussed in the introduction it is often desirable to measure vulnerability within a predefined range, for instance[0,1]or[0,log₂n](n=|X |). A natural way to achieve this is to considerk-spanning gain functions, implicitly limiting the range ofVg to an interval of size at mostk. This choice, however, excludes important vulnerabilities that cannot be expressed asVgfor anyk-spanningg.

For instance, for the Shannon vulnerability functionV_H(see §2) the additive capacity is equal to the well known Shannon mutual information. Although V_H lies within [0,log₂n] and it can be expressed as V_g_H for a suitable g_H, this gain function is not log₂n-spanning, in factkgHk = +∞. As a consequence, the additive capacity ML⁺

G¹(π, C), discussed in the previous section, does not provide a bound for gH- leakage. Indeed, the mutual-information of the fully transparent identity channelC_id on a uniform prior isL⁺_g_H(π^u, C_id) = log₂n, which exceeds its additive capacity wrt log₂n-spanning gain functions, which is equal tolog₂n·ML⁺

G¹(π^u, C_id) = ⁿ⁻¹_n log₂n. Aiming at robustness wrt a larger class of vulnerabilities, we can allow functions Vgthat have a bounded range, even thoughgitself is unbounded. Similarly toG¹X, we choose tolimitthe range ofVg without completely fixing it, by restricting to the class G^lX ={g:GX | kVgk ≤1}of1-spanning vulnerability functions.

4 The choiceW=X → {−1,1},g(w, x) = ¹₂ w(x)− Eπw

is also capacity-realizing [1].

(13)

π σ

π+c(σ−π)

π^c simplex boundary

Fig. 2.Line connectingπandσ, extended to the boundary of the simplex.

SincekVgk ≤ kgk, but not vice-versa (asVH demonstrates), it holds thatG¹X ⊂ G^lX. Since any convex (and continuous) function can be expressed as Vg for some properly constructedg[2],V_G^l_X is the class of all 1-spanning convex functions. Note also that, since anyVg,g:GX is bounded, it isk-spanning for somek.

To compute the additive capacity via the technique of §3.2, we need a quasimetric satisfying both conditions of Thm.2. From Theorem3we know that such a quasimetric (if it exists) is unique and equal to thed^<_G construction. In the previous section, this turned out to be the well-known total variation distance. In this section, on the other hand, we start directly withd^<_G for our classG=G^lX. The resulting quasimetricd^<

G^l

is called the “convex-separation” quasimetric, and is given by d^<

G^l(π, σ) := sup

g∈G^lX

d^<_R(V_g(π), V_g(σ)) = sup

g∈G^lX

V_g(σ)−V_g(π). Note that, once again, we removed themaxfrom the definition ofd^<

Rsince thesupis anyway non-negative.

An important property ofd^<

G^lis that it admits a simple closed-form solution.

Theorem 5. The convex-separation quasimetricd^<

G^lis equal to d^<

G^l(π, σ) = max

x:dπe1−σx

π_x .

Proof. Letπ, σ ∈DX. Assumeπ6=σ(the caseπ=σis trivial) and consider the line inRⁿjoining the two priors, as shown in Fig.2. The points on that line, starting fromπ and moving towardsσcan be written asπ^c=π+c(σ−π)forc≥0. Continuing on that line, at some point we are going to hit the boundary of the probability simplexDX. Letπ^cbe the point on that boundary, i.e.

c := max{c|π^c ∈DX }. (5)

Note thatc ≥ 1 since π¹ = σ ∈ DX. Now let F:DX →R be convex and 1- spanning. Sinceσlies in the line segment betweenπandπ^c, we can write it as a convex combination

σ = c⁻¹π^c+ (1−c⁻¹)π (6) withc⁻¹∈(0,1]. From convexity we get that

F(σ) ≤ c⁻¹F(π^c) + (1−c⁻¹)F(π), from which, together withF(π^c)−F(π)≤1(F is 1-spanning), we get

F(σ)−F(π) ≤ c⁻¹(F(π^c)−F(π)) ≤ c⁻¹. (7)

(14)

We now computec, which is given by the maximization problem (5). The problem has a single variablecand the constraintπ^c ∈ DX can be expressed by P

xπ^c_x = 1 and the inequalities π^c_x ≥ 0. The first constraint P

xπ_x^c = 1 is always satisfied by construction ofπ^c. Hence we only need to ensure thatπ^c_x=πx+c(σx−πx)≥0for allx:X . Ifπx=σxthis is always satisfied, and ifπx< σxthen this imposes alower bound onc. The only interesting case is whenπx> σxwhich gives us an upper bound c≤^π^x/π_x−σ_x. The maxcsatisfying all upper bounds is equal to the smallest of them:

c = min

x:πx>σx

π_x πx−σx

. Replacingcin (7) we get

F(σ)−F(π) ≤ max

x:πx>σx

πx−σx

πx

= max

x:dπe1−σx

πx

. We finally show that the above bound is attainable. Define

Fπ(τ) := max

x:dπe1−τ_x πx

.

Fπis convex as themaxof convex (in fact linear) functions ofτ, so it can be expressed as V_g (see Thm. 6for the exact g). Moreover,F_π(π) = 0, henceF_π(σ)−F_π(π) = max_x:dπe1−^σ_π^x

x, which concludes the proof. 2

We can now used^<

G^lto compute the additive(G^l,π)-capacity.

Theorem 6. Given a channelCand priorπ, it holds that ML⁺

G^l(π, C) = E_[πŻC]d^<

G^lπ = 1−X

y:Y

min

x:dπeCx,y . The capacity is realized by the complement of the “π-reciprocal” gain function

W = dπe, g_π^c–1 =

1−_π¹

x, ifw=x 1, ifw6=x , for which it holds thatV_g^c

π–1 =d^<

G^lπ, and as a consequenceg^c_π–1:G^lX.

Proof. The result comes from Thm.2ford=d^<

G^landG =G^lX; we show here that its two conditions of the theorem hold. The second is satisfied automatically by the construction ofd^<

G^l(Thm.3). For the first condition, after simple calculations we find that theg^c_π_–1-vulnerability function is equal toVg^c_π–1(σ) = max_x:_dπe1−^σ_π^x

x, hence from Thm.5we have thatVg^c_π–1 =d^<

G^lπ. Finally, simple calculations show thatVg_π–1^c [πŻC] = 1−P

ymin_x∈dπeCx,y, which concludes the proof sinceVg^c_π–1(π) =d^<

G^lπ(π) = 0. 2 Note that the capacity-realizing gain function g_π^c_–1 essentially “cancels out” the effect of the prior, makingML⁺

G^l(π, C)independent from π, and equal to 1 minus the sum of the columnminimaofC (ignoring the rows whenπx = 0). Remarkably,

(15)

the “π-reciprocal” gain function g_π–1 (the complement of g^c_π_–1) produces the same

“cancellation” effect formultiplicative(G⁺,π)-capacity, making it independent fromπ. An observation that can be made from Thm.6 is that the capacity realizingg is k-spanning fork= max_x:_dπe_π¹

x. From this we can conclude thatML⁺

G^l(π, C)≤k· ML⁺

G¹(π, C)fork= max_x:_dπe¹/π_x. In particularML⁺

G^l(π^u, C)≤ |X |·ML⁺

G¹(π^u, C) for the uniformπ^u.

A final note about our use of quasimetrics. Although the total variationtv, used in

§3.3forG¹X, is a proper metric,d^<

G^l used in this section forG^lX is not, since it is not symmetric. This is why we had to work with quasimetrics; it is certainly possible to define a symmetric variant ofd^<

G^l(eg. asmax{d^<

G^l(π, σ), d^<

G^l(σ, π)}), however this metric would not satisfy both conditions of Thm.2. Recall that for each classGthere can be at most one quasimetric satisfying both properties, and forG^lXthis isd^<

G^l. 3.5 Maximize over bothgandπ

This scenario was left open in [1] (which uses the class G¹X), since ML⁺

G¹(π, C) depends on the prior, and maximizing it overπis challenging. Our results on the larger class G^lX, however, lead to a complete solution sinceML⁺

G^l(π, C)is independent fromπ. By Thm.6, any full-supportπwithg_π^c_–1are capacity-realizing, giving

ML⁺

G^l(D, C) = 1−P

yminxCx,y .

4 The additive miracle theorem

The multiplicative Bayes-capacity ML^×_g

id(D, C) is well known to be realized on a uniform prior, and is equal to the sum of the column maxima ofC[9,24]. A result from [3], which was surprising enough to be named “Miracle”, states thatML^×_g

id(D, C)is in fact auniversal upper boundfor multiplicative leakage (wrt non-negativeg’s).

Theorem 7 (“Miracle”, [3]).For anyC,π:DX, and non-negativeg:G⁺X, we have L^×_g(π, C) ≤ P

ymaxxCx,y = ML^×_g_id(D, C). In [1], this theorem was used to easily conclude that the capacityML^×

G⁺(π, C)is equal toML^×_g_id(D, C)for any full-supportπ. In the additive case, having already a solution forML⁺

G^l(π, C), we can go in the opposite direction and obtain an additive variant of the miracle theorem. Denote byg_id^c = 1−g_idthe complement ofg_id.

Theorem 8 (“Additive Miracle”).For anyC,π:DX, andg:G^lX, we have L⁺_g(π, C) ≤ 1−P

yminxCx,y = |X | · ML⁺_gc id(D, C).

Proof. The inequality is a direct consequence of Thm.6; note that it holds for any prior since1−P

y:Ymin_x:_XC_x,y ≥1−P

y:Ymin_x:_dπeC_x,y. Now letg^∗=g^c_id× |X |, for which it holds thatML⁺_g∗(D, C) =|X | · ML⁺_gc

id(D, C). We have thatV_g^∗ =|X |(1− minx:Xπx). For uniformπ^uwe computeL⁺_g∗(π^u, C) = 1−P

yminxCx,y, and since g^∗:G^lX, this is an upper bound for allπ:DX, and hence equal toML⁺_g∗(D, C). 2

(16)

The multiplicative and additive miracle theorems are similar in nature, although they do have several differences. They both provide a universal bound for leakage, which holds for all priors and all gain functions within a certain class. In the multiplicative case, this is the classG⁺X of non-negative gain functions, while in the additive case, the class G^lXof gain functions producing a 1-spanningVg. In the multiplicative case the bound is given by the sum of column maxima ofC, while in the additive case by1minus the sum of column minima. In the multiplicative case the bound coincides with the(g_id,D)- capacity for the identity gain function (i.e. the Bayes-capacity), which is realized on a uniform prior. In the additive case the bound is (|X |times) the(g_id^c,D)-capacity for the

“complement of identity” gain function, also realized on a uniform prior.

C y1 y2

x₁0.8 0.2 x₂0.2 0.8 Example Consider the caseX ={x1, x2}with a gain function penalizing

wrong guesses, defined asW =X, andg(w, x) = 1iffw=xand−1 otherwise. Note thatVgis always non-negative since the probability of a correct guess is at least0.5(for|X |= 2). For a uniform priorπ^u, both

guesses are equivalent, giving expected gain0.5·1 + 0.5· −1 = 0, soVg(πû) = 0. Now consider the illustrated channelCwhich gives rather good information about the secret. Computing the two posteriors we getδ^y¹= (0.8,0.2)andδ^y² = (0.2,0.8), which both giveg-vulnerabilityV_g(δ^y¹) =V_g(δ^y²) = 0.8−0.2 = 0.6. HenceV_g[πûŻC] = 0.6 and as a consequenceL^×_g(πû, C) = +∞, clearly larger than the multiplicative Bayes capacityML^×_g_id(D, C) = 0.8 + 0.8 = 1.6. The miracle theorem does not apply here sincegtakes negative values.

On the other hand,V_gis 1-spanning (althoughgitself is2-spanning), since its value is at least0(for a uniform prior) and at most1(for a point prior). As a consequence the additive miracle theorem applies, guaranteeing thatL⁺_g(π^u, C)≤1−P

ymin_xC_x,y= 1−0.2−0.2 = 0.6. IndeedL⁺_g(π^u, C) = 0.6−0, exactly matching the bound. 2

5 Conclusion and future work

We studied the problem of computing additiveg-capacities. Extending the Kantorovich technique of [1] with quasimetrics, we provided a solution for the class G^lX of 1- spanning vulnerabilities, which, in contrast toG¹X, can include any vulnerability function (by scaling). The results also provided a solution to the problem of maximizing leakage over bothπandg, and lead to an additive variant of the miracle theorem of [3].

In future work we plan to study approximation algorithms for all scenarios, especially ML⁺_g(D, C)which is NP-complete in general. Moreover, we aim at developing a theory that unifies the two main approaches to robustness, namelycapacityandrefinement.

References

1. Alvim, M.S., Chatzikokolakis, K., McIver, A., Morgan, C., Palamidessi, C., Smith, G.:

Additive and multiplicative notions of leakage, and their capacities. In: Proc. of CSF. pp.

308–322. IEEE (2014)

2. Alvim, M.S., Chatzikokolakis, K., McIver, A., Morgan, C., Palamidessi, C., Smith, G.:

Axioms for information leakage. In: Proc. of CSF. pp. 77–92 (2016)

(17)

3. Alvim, M.S., Chatzikokolakis, K., Palamidessi, C., Smith, G.: Measuring information leakage using generalized gain functions. In: Proc. of CSF. pp. 265–279 (2012)

4. Antonopoulos, T., Gazzillo, P., Hicks, M., Koskinen, E., Terauchi, T., Wei, S.: Decomposition instead of self-composition for proving the absence of timing channels. In: PLDI. pp. 362–

375. ACM (2017)

5. Backes, M., Köpf, B., Rybalchenko, A.: Automatic discovery and quantification of information leaks. In: Proc. of S&P. pp. 141–153 (2009)

6. Barthe, G., Köpf, B.: Information-theoretic bounds for differentially private mechanisms. In:

Proc. of CSF. pp. 191–204 (2011)

7. Biondi, F., Kawamoto, Y., Legay, A., Traonouez, L.: Hyleak: Hybrid analysis tool for information leakage. In: ATVA. LNCS, vol. 10482, pp. 156–163. Springer (2017)

8. Biondi, F., Legay, A., Traonouez, L.M., W˛asowski, A.: Quail: A quantitative security analyzer for imperative code. In: Proc. of CAV. pp. 702–707. Springer (2013)

9. Braun, C., Chatzikokolakis, K., Palamidessi, C.: Quantitative notions of leakage for one-try attacks. In: Proc. of MFPS. ENTCS, vol. 249, pp. 75–91. Elsevier (2009)

10. Chatzikokolakis, K., Chothia, T., Guha, A.: Statistical measurement of information leakage.

In: Proc. of TACAS. pp. 390–404 (2010)

11. Chatzikokolakis, K., Palamidessi, C., Panangaden, P.: On the Bayes risk in information-hiding protocols. J. of Comp. Security16(5), 531–571 (2008)

12. Clarkson, M.R., Schneider, F.B.: Quantification of integrity. In: Proc. of CSF. pp. 28–43 (2010)

13. Cover, T.M., Thomas, J.A.: Elements of Information Theory. J. Wiley & Sons, Inc., second edn. (2006)

14. Doychev, G., Köpf, B.: Rigorous analysis of software countermeasures against cache attacks.

In: PLDI. pp. 406–421. ACM (2017)

15. Heusser, J., Malacaria, P.: Quantifying information leaks in software. In: Proc. ACSAC ’10.

pp. 261–269 (2010)

16. Khouzani, M.H.R., Malacaria, P.: Relative perfect secrecy: Universally optimal strategies and channel design. In: Proc. of CSF. pp. 61–76 (2016)

17. Köpf, B., Basin, D.: An information-theoretic model for adaptive side-channel attacks. In:

Proc. of CCS. pp. 286–296 (2007)

18. Köpf, B., Mauborgne, L., Ochoa, M.: Automatic quantification of cache side-channels. In:

Proc. of CAV ’12. pp. 564–580 (2012)

19. Köpf, B., Rybalchenko, A.: Approximation and randomization for quantitative information- flow analysis. In: Proc. of CSF. pp. 3–14 (2010)

20. Köpf, B., Smith, G.: Vulnerability bounds and leakage resilience of blinded cryptography under timing attacks. In: Proc. of CSF. pp. 44–56 (2010)

21. Malacaria, P.: Assessing security threats of looping constructs. In: Proc. of POPL. pp. 225–

235 (2007)

22. McIver, A., Meinicke, L., Morgan, C.: Compositional closure for Bayes risk in probabilistic noninterference. In: Proc. ICALP’10. pp. 223–235 (2010)

23. Meng, Z., Smith, G.: Calculating bounds on information leakage using two-bit patterns. In:

Proc. of PLAS. pp. 1:1–1:12 (2011)

24. Smith, G.: On the foundations of quantitative information flow. In: Proc. of FOSSACS. LNCS, vol. 5504, pp. 288–302. Springer (2009)

25. Sweet, I., Trilla, J.M.C., Scherrer, C., Hicks, M., Magill, S.: What’s the over/under? probabilistic bounds on information leakage. In: POST. LNCS, vol. 10804, pp. 3–27. Springer (2018)

26. Villani, C.: Topics in optimal transportation. No. 58, American Mathematical Soc. (2003) 27. Yasuoka, H., Terauchi, T.: Quantitative information flow — verification hardness and possi-

bilities. In: Proc. of CSF. pp. 15–27 (2010)