• Aucun résultat trouvé

Content-Identification : towards Capacity Based on Bounded Distance Decoder

N/A
N/A
Protected

Academic year: 2022

Partager "Content-Identification : towards Capacity Based on Bounded Distance Decoder"

Copied!
5
0
0

Texte intégral

(1)

Proceedings Chapter

Reference

Content-Identification : towards Capacity Based on Bounded Distance Decoder

FARHADZADEH, Farzad, VOLOSHYNOVSKYY, Svyatoslav, KOVAL, Oleksiy

Abstract

In recent years, content identification based on digital fingerprinting attracts a lot of attention in different emerging applications. In this paper, we perform the information-theoretic analysis of finite length digital fingerprinting systems. We show that the identification capacity over the Binary Symmetric Channel (\bsc) is achievable by using Bounded Distance Decoder.

FARHADZADEH, Farzad, VOLOSHYNOVSKYY, Svyatoslav, KOVAL, Oleksiy.

Content-Identification : towards Capacity Based on Bounded Distance Decoder. In:

Applications of Information Theory, Coding and Security, Yerevan, Armenia, 2010 . 2010.

Available at:

http://archive-ouverte.unige.ch/unige:47638

Disclaimer: layout of this document may differ from the published version.

(2)

Content-Identification: towards Capacity Based on Bounded Distance Decoder

Farzad Farhadzadeh, Sviatoslav Voloshynovskiy, and Oleksiy Koval

University of Geneva, Stochastic Information Processing (SIP) Group

Abstract

In recent years, content identification based on digital fingerprint- ing attracts a lot of attention in different emerging applications. In this paper, we perform the information–theoretic analysis of finite length digital fingerprinting systems. We show that the identification capac- ity over the Binary Symmetric Channel (BSC) is achievable by using Bounded Distance Decoder (BDD).

1 Introduction

In the last 10 years, content identification based on digital fingerprinting performed an impressive evolution from just an alternative solution to digi- tal watermarking in copyright protection to a standalone domain of research.

A digital fingerprint represents a short, robust and distinctive content de- scription allowing fast and privacy-preserving operations. In this case, all operations are performed on the fingerprint instead of on the original large and privacy–sensitive data.

Some important practical and theoretical achievements were reported during last years. The main efforts on the side of practical algorithms have been concentrated on the robust feature selection and fast indexing techniques mostly borrowed from content-based retrieval applications [1], [2]. The information–theoretic limits of content identification under infinite length and ergodic assumptions have been investigated by Willems et. al.

[3] using the jointly typical decoder.

In this paper we introduce the information–theoretic framework for the analysis of content identification system based on finite length fingerprint- ing. Contrary to previous works, we consider alternative decoding rules and demonstrate their capability to achieve the identification capacity limit

(3)

2 Identification Setup

The identification problem is to determine whether a queryy is related to some element of the database, and if so, to identify which one. To this end an algorithmψ(y) must be designed, returning:

ψ(y)∈ {0,1,2, . . . ,2N R} (1) where N is the database entry length, R is an identification rate and the decisionψ(y) = 0 indicates thatyis unrelated to any database elements. In this paper, we consider the identification problem for binary fingerprinting schemes. This scenario can be modeled as a multiple hypothesis test:

H0: The queryY is unrelated to any database entry,

Hm: The queryY is related to themth entry of the database.

We assume that the fingerprint bits are i.i.d. and equally likely to be 0 and 1, and the probabilistic mismatch between the channel inputX and output Y is modeled by theBSC.

The decoder performance can be evaluated in terms of:

• Probability of false positive (PF P ,Pr{ψ(y)>0|H0}),

• Probability of false negative(PF N ,P2N R

m=1Pr{ψ(y)6=m|Hm}).

2.1 The Decoding Rule

Under this model, upon receiving a queryy, the Bounded Distance decision rule can be defined as:

ψ(y) =

m,˜ if for a unique ˜m,dH(x( ˜m),y)≤γN,

0, otherwise. (2)

whereγ is the normalized threshold anddH(·,·) denotes the Hamming dis- tance.

3 Error Exponents

In this section, we derive bounds on the probabilities of false positive and negative using i.i.d. binary fingerprints with the decision rule given by (2).

(4)

The probability of false positive is given by:

PF P = Pr

2N R

[

m=1

dH(Y,X(m))≤γN|H0

(a)

2N R

X

m=1

Pr{dH(Y,X(m))≤γN|H0}

(b)= 2N RPr{dH(Y,X(1))≤γN|H0}, (3) where (a) follows from the union bound, and (b) follows from the fact that the fingerprintsX(m) are i.i.d. . UnderH0, Y and X(1) are independent, thus dH(Y,X(m)) ∼ B(N,12). By using the Chernoff–bound on the tail of binomial distributions B(N,12), the probability of false positive can be bounded as:

PF P ≤2N R2−N d(γk12)= 2−N(1−H2(γ)−R), (4) where d(αkβ) , αlog2αβ + (1 −α) log21−α1−β is defined as the divergence between 0≤α≤1 and β and H2(γ) denotes the binary entropy.

Now consider the probability of false negativePF N. Given thatHm, m6=

0 is true, the false negative event occurs ifdH(Y,X(m))> γN or if there is any m 6=m such that dH(Y,X(m)) ≤γN. Thus the probability of false negative is defined by:

PF N = Pr

dH(Y,X(m))> γN∪ [

m6=m

dH(Y,X(m))≤γN|Hm

(a)

≤ Pr{dH(Y,X(m))> γN|Hm}+ X

m6=m

Pr{dH(Y,X(m))≤γN|Hm}

(b)

≤ 2−N d(γkPb)+ 2−N(1−H2(γ)−R), (5) where (a) follows from the union bound, and (b) follows from Chernoff–

bound and the facts that underHm,dH(Y,X(m))∼ B(N, Pb), where Pb is the crossover probability of theBSC,dH(Y,X(m))∼ B(N,12).

In order to consider the probabilities of false positive and negative to- gethers, the summation ofPF P andPF N is bounded. By combining (4) and (5), this bound is given by:

PF P +PF N ≤2−N d(γkPb)+ 2−N(1−R−H2(γ)−N1). (6) One can conclude that for P b ≤ γ ≤ 12 and if H2(γ) ≤ 1−R− N1, there exist fingerprints with rate R such that limN→∞(PF P +PF N) = 0. Since

(5)

this holds for γ arbitrarily close to Pb, we claim that R = 1−H2(Pb) is achievable and from [3], this is the identification capacity setup.

We now investigate the “best” upper bound over the probability of false.

It is explicit that

PF P +PF N ≤2×2−Nmin[d(γkPb),(1−H2(γ)−R−N1))]. (7) Therefore, the best bound is achieved by γopt, which makes both terms equal. The optimum value of threshold is given by:

γopt = 1−R+ log2(1−Pb)−N1 log2 1−PP b

b

. (8)

Note that asN → ∞,γopt converges to the optimum threshold under com- munication setup [4].

4 Conclusion

In this paper, we have presented a framework for content identification based on binary fingerprints. By performing an information-theoretic analysis of fi- nite length digital fingerprinting systems and proved that by using theBDD, the identification capacity limit under asymptotic assumptions is achievable.

Acknowledgement

This paper was partially supported by SNF project 200021-119770.

References

[1] J. Haitsma, T. Kalker, and J. Oostveen,“Robust audio hashing for con- tent identification,” in International Workshop on CBMI 2001.

[2] F. Lefebvre and B. Macq, Rash : RAdon Soft Hash algorithm, in Proc.

of EUSIPCO, Toulouse, France, 2002.

[3] F. Willems, T. Kalker, J. Goseling, and J. Linnartz, “On the capacity of a biometrical system”, in Proc. 2003 IEEE ISIT, Yokohama, Japan.

[4] https://www.sps.ele.tue.nl/members/F.M.J.Willems.

Références

Documents relatifs

We propose two types of optimization problems that allow identifying the parameters of a 2-additive bi-capacity such that the inferred bipolar Choquet integral is consistent with

http://www.numdam.org/.. The expo- nential exp {ik\/2z) represents an incident plane wave in the direction of the (arbitrarily chosen) z-axis and the second term on the right-hand

It is necessary to form an integrating data model based on the ontological representations that obtained after mapping the RDB structure of each of the integrated information

The next question is the stability issue: is the difference of the two terms controlling a distance to the set of optimal functions.. A famous example is pro- vided by

We introduced a new identi- cation setup by using a xed maximum list size decoder based, on an Order Statistics List Decoder (OSLD) and analyzed its performance versus unique

We introduce a new identification setup by using a fixed maximum list size decoder based on an order statistic list decoder ( OSLD ) and analyze its performance versus unique

In this paper, we perform an information-theoretic analysis of finite length digital fingerprinting systems in a private content identification setup and reveal certain

We consider a privacy protection strategy based on reliable components of data and investigate the performance of this scheme with respect to achievable identification rate and