Covering Point Patterns

(1)

HAL Id: hal-01226923

https://hal.archives-ouvertes.fr/hal-01226923

Submitted on 16 Nov 2015

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Amos Lapidoth, Andreas Malär, Ligong Wang

To cite this version:

Amos Lapidoth, Andreas Malär, Ligong Wang. Covering Point Patterns. IEEE Transactions on Infor- mation Theory, Institute of Electrical and Electronics Engineers, 2015, �10.1109/TIT.2015.2453946�.

�hal-01226923�

(2)

Covering Point Patterns

Amos Lapidoth Fellow, IEEE, Andreas Mal¨ar, and Ligong WangMember, IEEE

Abstract—A source generates a point pattern consisting of a finite number of points in an interval. Based on a binary description of the point pattern, a reconstructor must produce a covering set that is guaranteed to contain the pattern. We study the optimal trade-off (as the length of the interval tends to infinity) between the description length and the least average Lebesgue measure of the covering set. The trade-off is established for point patterns that are generated by homogeneous and inhomogeneous Poisson processes. The homogeneous Poisson process is shown to be the most difficult to describe among all point patterns.

We also study a Wyner-Ziv version of this problem, where some of the points in the pattern are revealed to the reconstructor but not to the encoder. We show that this scenario is as good as when they are revealed to both encoder and reconstructor. A connection between this problem and the queueing distortion is established via feedforward. Finally, we establish the aforementioned trade- off when the covering set is allowed to miss some of the points in the pattern at a certain cost.

Index Terms—Poisson process, rate-distortion problem, side information, Wyner-Ziv problem, feedforward.

I. INTRODUCTION

IMAGINE a controller that receives a request that a computer be on at certain epochs. If the controller could describe these epochs to the computer with infinite precision, then the computer would turn itself on only at these epochs and be in sleep mode at all other times. If the controller cannot describe the epochs at all, then the computer must be on all the time. In this paper we study the trade-off between the bit rate with which the epochs can be described and the percentage of time the computer must be on.

More specifically, we consider a source that generates a

“point pattern” consisting of a finite number of points in the interval [0, T]. Based on a binary description of the pattern, a reconstructor must produce a “covering set”: a subset of [0, T] containing all the points. There is a trade-off between the description length and the minimal Lebesgue measure of the covering set. This trade-off is formulated as a continuous- time rate-distortion problem in Section II. In this paper we investigate this trade-off in the limit whereT tends to infinity.

This work was presented in part at the 2011 IEEE International Symposium of Information Theory (ISIT) in Saint Petersburg, Russia.

A. Lapidoth is with ETH Zurich, Switzerland (e-mail: lapidoth@isi.ee.ethz.ch).

A. Mal¨ar was with the Signal and Information Processing Laboratory, ETH Zurich, Switzerland. He is now with Malcom AG, Zurich, Switzerland (e-mail:

andreas@malcom.ch).

L. Wang is with ETIS, CNRS UMR 8051, ENSEA, Universit´e de Cergy- Pontoise, France (e-mail: ligong.wang@ensea.fr). Part of this work was conducted while L. Wang was with the Signal and Information Processing Laboratory, ETH Zurich, Switzerland.

Communicated by Jun Chen, Associate Editor for Shannon Theory.

Copyright c 2015 IEEE. Personal use of this material is permitted.

However, permission to use this material for any other purposes must be obtained from the IEEE by sending a request to pubs-permissions@ieee.org

For point patterns that are generated by a homogeneous Poisson process of intensity λ, we show that, for the reconstructor to produce covering sets of average measure not exceedingDT, the required description rate in bits per second is−λlogD [1]. This result is closely related to results on the capacity of the ideal peak-limited Poisson channel [2]–[5]. In fact, in the spirit of [6], the two problems may be considered dual, although this duality is not exactly the same as the one formulated in [6]. Also, [6] considers discrete-time channels and sources, whereas our problem is in continuous time.

Rate-distortion problems for Poisson processes under different distortion measures were studied in [7]–[11]. It is interesting that our rate-distortion function,−λlogD, is equal to the one in [11], where a queueing distortion measure was considered. This is no coincidence: the connection between our and the queueing distortion measure is established when we introducefeedforward to the problem in Section VI. This connection is similar to the connection between the Poisson channel and the queueing channel [12].

One nice feature of our distortion measure is that it can be naturally extended to any measure space on which a Poisson process can be defined, whereas the previously-studied distortion measures [7]–[11] only apply to Poisson processes on the real line.

We also derive a formula for the required description rate when the point pattern is generated by an inhomogeneous Pois- son process. More generally, we show that the homogeneous Poisson process is the most difficult to cover, in the sense that any point process that, with high probability, has no more than λT points in [0, T] can be described with −λlogD bits per second. This is true even if an adversary selects an arbitrary point pattern, provided that the encoder and the reconstructor are allowed to use random codes. Here we note that, after the appearance of [1], a stronger result was shown in [13] that deterministic codes of the same rate are sufficient to cover arbitrary point patterns.

We then consider a Wyner-Ziv setting [14] of the problem, where some points in the pattern are revealed to the reconstructor but the encoder does not know which ones. This can be viewed as a dual problem to the Poisson channel with noncausal side information [15]. We show that in this setting one can achieve the same minimum rate as when the transmitterdoes know the reconstructor’s side information.

Finally we consider the case where the covering set is allowed to miss some of the points in the pattern, and where every missed point adds a certain cost to the overall distortion.

The exact expression for the rate-distortion function is established for the homogeneous Poisson process. This problem was also considered in [13] in a perhaps more natural setting.

(3)

x

-t

-t ˆ

x

missed!

@@R

Fig. 1. Illustration of the problem. Herexˆis not allowed because it does not cover all the points.

A. Notation

We use lower-case letters like x to denote numbers, and upper-case letters like X to denote random variables. Ex- ceptions to this rule include T for time, R for rate, and D for expected distortion, which are all deterministic. Boldface lower-case letters like x denote vectors, functions from the reals, or point patterns, depending on the context. If x is a vector, xi denotes its ith element. If x is a function, x(t) denotes its value at t ∈ R. And if x is a point pattern, nx(·) denotes its counting function, with nx(t2)−nx(t1) being the number of points in x that fall in the interval (t₁, t₂]. Boldface upper-case letters like X denote random vectors, random functions, or random point processes. The random counting function corresponding to a point processX is denotedN_X(·).

We denote by Ber(p) the Bernoulli distribution of param- eter p, which assigns probability p to the outcome 1 and probability(1−p)to the outcome0.

Unless stated otherwise, we use binary logarithm and measure information in bits throughout this paper.

The rest of this paper is arranged as follows: in Sections II and III we present our results for homogeneous and inhomogeneous Poisson processes, respectively; in Section IV we present our results for general point processes and arbitrary point patterns; in Section V we discuss the Wyner-Ziv setting;

in Section VI we introduce feedforward and demonstrate a coding scheme based on existing work on the queueing distortion; in Section VII we present our result for the case where missing a point is allowed but at some cost; and in Section VIII we conclude the paper with some remarks.

II. HOMOGENEOUSPOISSONPROCESSES

Consider a homogeneous Poisson process Xof intensityλ on the interval [0, T]. Its counting function NX(·)satisfies

Pr [NX(t+τ)−NX(t) =k] = e^−λτ(λτ)^k

k! (1)

for allτ ∈[0, T],t∈[0, T −τ] andk∈ {0,1, . . .}.

The encoder maps the realization of the Poisson process to a message in{1, . . . ,2^{T R}}, whereRis the description rate in bits per second. The reconstructor then maps this message to a{0,1}-valued, Lebesgue-measurable, signalx(t),ˆ t∈[0, T].

We wish to minimize the length of the region where x(t) =ˆ 1 while guaranteeing that all points in the original Poisson process lie in this region. See Figure 1 for an illustration.

More formally, we formulate this problem as a continuous- time rate-distortion problem, where the distortion between the point pattern xand the reproduction signalxˆ is defined as

d(x,x)ˆ ,







µ_L xˆ⁻¹(1)

T , if all points inxare inxˆ⁻¹(1)

∞, otherwise,

(2) wherexˆ⁻¹(1) is the set of allt ∈[0, T] such thatx(t) = 1,ˆ and where µ_L(·)denotes the Lebesgue measure.

We say that(R, D)is an achievable rate-distortion pair for Xif, for every >0, there exists someT₀>0 such that, for everyT > T₀, there exist an encoderf_T(·)and a reconstructor φT(·) of rate R+ bits per second that, when applied to X on[0, T], result in

E

d X, φT(fT(X))

≤D+. (3) Denote by R(D, λ) the minimal rate R such that (R, D) is achievable for the Poisson process of intensity λ. Define

RPois(D, λ),

(−λlogD bits per second, D∈(0,1)

0, D≥1. (4)

Theorem 1(Homogeneous Poisson). For allD, λ >0, R(D, λ) =R_Pois(D, λ). (5) Note:Theorem 1 can be alternatively expressed as

D(R, λ) = 2^−R/λ, R, λ >0, (6) where D(R, λ) denotes the minimal expected distortion D such that (R, D) is achievable for the Poisson process of intensity λ.

A. Proof of Theorem 1 via Discretization

To prove Theorem 1, we propose a scheme to reduce the original problem to one for a discrete memoryless source.

This is reminiscent of Wyner’s scheme for reducing the peak- limited Poisson channel to a discrete memoryless channel [4].

We shall show the optimality of this scheme in Lemma 1, and we shall then prove Theorem 1 by computing the best rate that is achievable using this scheme.

Scheme 1. We divide the time interval[0, T]intoT /∆slots¹ of duration ∆. The encoder first maps the original point patternxto a{0,1}-valued vectorx^∆of(T /∆)components in the following way: if x has at least one point in the slot ((i−1)∆, i∆], then we set the ith component of x^∆ to 1.

Otherwise, we set it to zero. The encoder then mapsx^∆ to a message in{1, . . . ,2^{T R}}.

Based on the encoder’s message, the reconstructor produces a {0,1}-valued length-(T /∆) vector ˆx^∆ that meets the distortion criterion

Eh

d^∆(X^∆,Xˆ^∆)i

≤D+, (7)

1IfT is not divisible by∆, we replace it with T⁰ =dT /∆e∆. When

∆ tends to zero, the difference between RT and RT⁰ tends to zero.

Consequently, we shall ignore this edge effect and assume thatT is divisible by∆.

(4)

where the distortion measure d^∆(·,·) between vectors is defined in terms of the single-letter distortion function

d^∆(0,0) = 0 (8a)

d^∆(0,1) = 1 (8b)

d^∆(1,0) =∞ (8c)

d^∆(1,1) = 1. (8d)

It then maps ˆx^∆ to the piecewise-constant continuous-time signalxˆ

ˆ

x(t) = ˆx^∆_dt

∆e, t∈[0, T]. (9) Scheme 1 reduces the task of designing a code forXsubject to the distortion d(·,·)to the task of designing a code for the vector X^∆ subject to the distortiond^∆(·,·)because

d(x,x) =ˆ d^∆(x^∆,xˆ^∆). (10) WhenX is a Poisson process of intensityλ, the components of X^∆ are independent and identically distributed (IID), and each is Ber(1−e^−λ∆). LetR_∆(D, λ)denote the rate-distortion function for X^∆ andd^∆(·,·). If we combine Scheme 1 with an optimal code forX^∆subject toEh

d^∆(X^∆,Xˆ^∆)i

≤D+, we can achieve any rate that is larger than

R∆(D, λ)bits

∆ seconds . (11)

The next lemma, which is reminiscent of [5, Theorem 2.1], shows that when we let ∆ tend to zero, there is no loss in optimality in using Scheme 1.

Lemma 1. For allD, λ >0, R(D, λ) = lim

∆↓0

R∆(D, λ)

∆ . (12)

Proof: Clearly, for all ∆ > 0, (11) cannot exceed R(D, λ). To prove (12) we only need to show

R(D, λ)≤lim

∆↓0

R∆(D, λ)

∆ . (13)

We prove this in the following way: Given any rate-distortion code with 2^{T R} codewords xˆm, m ∈ {1, . . . ,2^{T R}} that achieves expected distortion D, we shall construct a new code that can be constructed through Scheme 1, that contains (2^{T R}+ 1)codewords, and that achieves an expected distortion that is arbitrarily close to D.

Denote the codewords of our new code by wˆm, where m ∈ {1, . . . ,2^{T R} + 1}. We choose the last codeword to be the constant 1. We next describe our choices of the other codewords. For every > 0 and every xˆ_m, we can approximate the set{t: ˆx_m(t) = 1}by a setAmthat is equal to a finite, sayN_m, union of open intervals. More specifically,

µ_L xˆ⁻¹_m(1)4 Am

≤2^{−T R}, (14) where 4 denotes the symmetric difference between two sets (see, e.g., [16, Chapter 3, Proposition 15]). Define

B,

2^{T R}

[

m=1

ˆ

x⁻¹_m(1)\ Am

, (15)

-t -∆

-t 1_A_m

ˆ wm

Fig. 2. ConstructingwˆmfromAm.

and note that by (14)

µL(B)≤. (16)

For eachAm,m∈ {1, . . . ,2^{T R}}, define Tm,

t∈[0, T] : (dt/∆e −1) ∆,dt/∆e∆

∩ Am6=∅ . (17) We now constructwˆ_m,m∈ {1, . . . ,2^{T R}}as

wˆm=1Tm, (18) where 1_S denotes the indicator function of the set S. Note that Am⊆ Tm= ˆw⁻¹_m(1). See Figure 2 for an illustration of this construction. Let

N , max

m∈{1,...,2^{T R}}Nm. It can be seen that

µL wˆ⁻¹_m(1)

−µL(Am)≤2N∆, m∈ {1, . . . ,2^{T R}}. (19) Our encoder works as follows: ifxcontains no point inB, it mapsxto the same message as the given encoder; otherwise it mapsx to the index(2^{T R}+ 1) of the all-one codeword.

To analyze the distortion, first consider the case where x contains no point in B. In this case, all points in x must be covered by the selected codewordwˆ_m. By (14) and (19), the differenced(x,wˆ_m)−d(x,xˆ_m), if positive, can be made arbitrarily small by choosing small and ∆ (independently of x). Next consider the case where x does contain points inB. By (16), the probability that this happens can be made arbitrarily small by choosingsmall, therefore its contribution to the expected distortion can also be made arbitrarily small.

We conclude that our code{wˆm}can achieve a distortion that is arbitrarily close to the distortion achieved by the original code{xˆm}.

We next proceed to prove Theorem 1. We deriveR(D, λ) by computing the right-hand side of (12) using [17, Theorem 9.3.2], which is a generalization of Shannon’s formula for the rate-distortion function of a discrete memoryless source [18]

to unbounded distortion measures. This gives us R_∆(D, λ) = min

P_X_ˆ_∆_|X_∆: E[^d^∆^(X^∆^,^X^ˆ^∆⁾]≤D

I(X^∆; ˆX^∆). (20)

First consider the case where D ∈ (0,1). It is clear that, to satisfy Eh

d^∆(X^∆,Xˆ^∆)i

≤D, one must choose Xˆ^∆ = 1 with probability one whenever X^∆ = 1. Assume that ∆ is small enough so that

1−e^−λ∆< D. (21)

(5)

Then the conditional probability for Xˆ^∆ = 1when X^∆= 0 should be chosen such that the probability Xˆ^∆ = 1 is equal toD. Hence the optimal conditional distribution is

P_X^∗_ˆ_∆_|X_∆(1|0) = 1−e^λ∆+De^λ∆, (22a) P_X^∗_ˆ_∆

|X^∆(1|1) = 1. (22b)

We compute the mutual informationI(X^∆; ˆX^∆)corresponding to this conditional law as follows

R_∆(D, λ) =I(X^∆; ˆX^∆) (23)

=H( ˆX^∆)−H( ˆX^∆|X^∆) (24)

=H( ˆX^∆)−e^−λ∆H( ˆX^∆|X^∆= 0)

−(1−e^−λ∆)H( ˆX^∆|X^∆= 1) (25)

=H_b(D)−e^−λ∆H_b(1−e^λ∆+De^λ∆), (26) whereHb(·)denotes the binary entropy function:

Hb(p) =plog1

p+ (1−p) log 1

1−p, p∈[0,1]. (27) Using (12) and (26) we have

R(D, λ)

= ∂ Hb(D)−e^−λ∆Hb(1−e^λ∆+De^λ∆)

∂∆

_∆=0

(28)

=λe^−λ∆H_b(1−e^λ∆+De^λ∆)

+λe^−λ∆(1−D) log e^λ∆−De^λ∆

1−e^λ∆+De^λ∆

_∆=0

(29)

=λH_b(D) +λ(1−D) log1−D

D (30)

=−λlogD, D∈(0,1). (31)

The case whereD≥1 is simple: one can use the constant signal Xˆ(t) = 1, t ∈[0, T] as the reconstruction irrespective of the realization of X, hence

R(D, λ) = 0, D≥1. (32) Combining (31) and (32) yields (5).

B. Alternative Proof of Converse

In the above proof of Theorem 1, the converse is based on Lemma 1, i.e., on the optimality of the discretization approach.

We next provide an alternative proof of the converse, which does not rely on this optimality. One advantage of this proof over the previous one is that it can be applied to general measure spaces; see Section II-C.

We shall show that, for any positive ε and δ, for large enough T, any code that achieves expected distortionD <1 must have rate at least

R≥(1−δ)(λ−ε) log1−δ

D . (33)

To this end, letK,NX(T)be the total number of points in X in the interval [0, T], and let M ∈ {1, . . . ,2^{T R}} be the label of the chosen codeword. Note that, assuming the expected distortion is finite (i.e., no arrival is missed), the distortiond(x,ˆx)is uniquely determined by the realization of

M =m: it is the Lebesgue measure of themth reconstruction signal divided byT, which we denoted_m.

Conditional K = k, the k arrivals in x, when randomly labeled, are IID uniformly on [0, T]; see, e.g., [19, Theorem 2.4.6]. It then follows that the probability that themth reconstruction signal covers all thesekpoints isd^k_m, which implies P_M_|K(m|k)≤d^k_m, (34) i.e.,

klog 1

d_m ≤log 1

P_M_|K(m|k). (35) The inequality is because themth codeword may coverxand still not be picked if another codeword of smaller distortion also coversx. Averaging over the message mwe have

k

2^{T R}

X

m=1

P_M_|K(m|k) log 1

d_m ≤H(M|K=k)≤T R. (36) Because the functiona7→log(1/a)is convex, we further have by (36) and Jensen’s inequality

T R≥klog 1 Eh

d(X,X)ˆ

K=ki, (37) for allk= 0,1, . . .. Averaging overkyields

T R≥

∞

X

k=0

PK(k)klog 1 Eh

d(X,X)ˆ

K=ki. (38) Let L be the event {K ≥ (λ−ε)T}. Since K has the Poisson distribution of mean λT, we know that, for large enoughT, the probability ofLis at least1−δ. This combined with the fact thatd(x,ˆx)is nonnegative implies

Eh

d(X,X)ˆ Li

≤ Eh

d(X,X)ˆ i

1−δ = D

1−δ. (39) Continuing from (38) we have

T R≥ X

k≥(λ−ε)T

PK(k)klog 1 Eh

d(X,X)ˆ

K=ki (40)

≥(λ−ε)T X

k≥(λ−ε)T

P_K(k) log 1 Eh

d(X,X)ˆ

K=ki (41)

≥(1−δ)(λ−ε)TE



log 1 Eh

d(X,X)ˆ Ki

L



 (42)

≥(1−δ)(λ−ε)Tlog 1 Eh

d(X,X)ˆ

Li (43)

≥(1−δ)(λ−ε)Tlog1−δ

D , (44)

where (42) follows because the probability ofLis at least1−δ;

(43) follows from the convexity ofa7→log(1/a)and Jensen’s inequality; and (44) follows from (39). This establishes (33).

(6)

C. Fixed-Interval Formulation and General Measure Spaces We now introduce an alternative formulation for Theorem 1.

In this formulation, instead of letting the length of the interval tend to infinity, we keep this interval fixed and let theintensity go to infinity: the source X is a Poisson process of intensity T ·λ on the interval [0,1], where we let T tend to infinity.

The distortion is now defined as d(x,ˆx),

(µ_L xˆ⁻¹(1)

, if all points inxare in xˆ⁻¹(1)

∞, otherwise.

(45) A rate-distortion pair (R, D) is said to be achievable if, for large enough T, there exists a codebook of size 2^{T R} with expected distortion arbitrarily close to D, and R(D, λ) is the smallest R such that (R, D) is achievable. It is easy to verify that R(D, λ) still satisfies (5), and that the proofs in Sections II-A and II-B are both valid (after minor adjustments).

The above formulation allows us to extend Theorem 1 to general measure spaces (instead of an interval on the real line with the Lebesgue measure), as well as to inhomogeneous Poisson processes. We next discuss general measure spaces;

inhomogeneous Poisson processes will be discussed in the Section III.

Consider any measurable space on a setS equipped with a measureµ(·)that satisfies the following:

• It is non-atomic: any measurable A ⊆ S such that µ(A) > 0 contains a measurable subset B ⊂ A such that 0< µ(B)< µ(A).

• The measure of S itself is finite: µ(S) < ∞. Without loss of generality, let µ(S) = 1.

Then there exist Poisson processes on S with µ(·) as their

“mean measure”; see, e.g., [20]. Specifically, for anyλT >0, there exists a Poisson process X onS such that the number of arrivals in X in any measurable A ⊆ S has the Poisson distribution of meanλT µ(A). We can hence formulate a rate- distortion problem by replacing the interval [0,1]by S, and by replacing the Lebesgue measureµL(·)byµ(·).

It is easy to see that Theorem 1 holds on such measure spaces. First, note that Scheme 1 is applicable because µ(·) is non-atomic, hence the achievability part of Theorem 1 on S can be proven exactly as in Section II-A. The optimality of this discretization approach in Lemma 1, however, relies on certain properties of the Lebesgue measure on an interval, and is difficult to extend to S. But the converse proof in Section II-B is applicable to a homogeneous Poisson process on S.

The simplest examples for S are multi-dimensional Eu- clidean spaces.

III. INHOMOGENEOUSPOISSONPROCESSES

In this section we consider the case where X is an inhomogeneous Poisson process. As in Section II-C, we fix the time interval and let the intensity of the Poisson process tend to infinity. Let X be a Poisson process on the interval [0,1]of intensity T·λ(t), t∈[0,1], where λ(·) is Lebesgue measurable, nonnegative, and bounded on [0,1], and where

T >0is a scaling factor. Hence the counting functionN_X(·) satisfies

Pr [N_X(t+τ)−N_X(t) =k] = e^−αα^k

k! (46)

for allτ∈[0,1],t∈[0,1−τ]andk∈ {0,1, . . .}, where α=T

Z t+τ t

λ(s)ds. (47)

We define the rate-distortion function R(D,λ) in the same way as in Section II-C.

Theorem 2 (Inhomogeneous Poisson). The rate-distortion functionR(D,λ)is given by

R(D,λ) = min

D: [0,1]→[0,1]˜ R1

0D(t)˜ dt≤D

Z 1 0

λ(t) log 1

D(t)˜ dt. (48)

Moreover, if

sup

t

λ(t)≤ λ¯

D, (49)

whereλ¯,R1

0 λ(t)dt, then(48)simplifies to R(D,λ) = ¯λlog 1

D + ¯λlog ¯λ− Z 1

0

λ(t) logλ(t)dt. (50) Note that, unlike most previous works on rate-distortion problems with memory [21], Theorem 2 does not require that the source be stationary or ergodic.

The proof of Theorem 2 is divided into three parts: the achievability part of (48), the converse part of (48), and the proof of (50).

A. Achievability of (48)

As in Section II-A, the coding scheme we propose is based on discretization, but now the length of each interval depends onT and is chosen to be(∆/T). The resulting discrete-time source X^∆ is a sequence of independent but not identically distributed binary random variables: X_i^∆ is Ber(1−e^−λⁱ^∆) where

λ_i, T

∆ Z i∆/T

(i−1)∆/T

λ(t)dt. (51)

The discrete-time distortion function is again given by (8).

Fix a measurable function D(t),˜ t ∈ [0,1]. Our discrete- time codebook is generated as follows. We first generate 2^{T R} codewords (covering sets) independently, where in each codeword, theith symbol is chosen to be1 with probability

D_i, T

∆ Z i∆/T

(i−1)∆/T

D(t)˜ dt (52) independently of other symbols in the codeword. We then append the all-one sequence to the codebook, so the total size of the codebook is 2^{T R}+ 1.

The encoder maps x^∆ to the codeword with the smallest number of ones that covers x^∆.

(7)

We next analyze the performance of this random codebook.

We consider three different cases. The first is the case where x^∆does not satisfy

T /∆

X

i=1

x^∆_i log 1 Di

≤∆

T /∆

X

i=1

λilog 1 Di

+T (53)

where is a fixed positive number. We show that the probability that X^∆ does not satisfy (53) tends to zero asT tends to infinity. Indeed, we have

E





T /∆

X

i=1

X_i^∆log 1 Di



=

T /∆

X

i=1

(1−e^−λⁱ^∆/T) log 1 Di

(54)

≤∆

T /∆

X

i=1

λ_ilog 1 Di

(55) whereas

var





T /∆

X

i=1

Xilog 1 Di



=

T /∆

X

i=1

var(Xi)

log 1 Di

²

(56)

≤

T /∆

X

i=1

E X_i²

log 1

Di

²

(57)

=

T /∆

X

i=1

(1−e^−λⁱ^∆/T)

log 1 Di

2

(58)

≤∆

T /∆

X

i=1

λi

log 1

Di

2

. (59)

The expressions on the right-hand side of (55) and (59) both tend to infinity linearly with T as T tends to infinity. It then follows by Chebyshev’s inequality that the probability thatX^∆ does not satisfy (53) tends to zero asT tends to infinity. Since the distortion for anyx^∆ is at most one, we conclude that the total contribution to the expected distortion from this first case can be neglected.

The second case we consider is wherex^∆ does satisfy (53) but no covering set except the all-one set covers x^∆. Ac- cording to our construction, the probabilityπthat a randomly chosen covering set covers x^∆ is

π, Y

j:x^∆_j=1

D_j (60)

=

T /∆

Y

i=1

(Di)^x^∆ⁱ (61)

= 2⁻^P

T /∆

i=1 x^∆_i log_Di¹

(62)

≥2^−∆^P

T /∆

i=1 λilog_Di¹ +T

(63) where the last inequality follows because we are now consider- ing the case where (53) does hold. Since the first2^{T R}covering sets are chosen IID, the probability that none of them covers x^∆is (1−π)²^{T R}, which tends to zero as T tends to infinity

provided that

R > lim

T→∞

∆ T

T /∆

X

i=1

λilog 1

D_i + (64)

= Z 1

0

λ(t) log 1

D(t)˜ dt+ (65) where the last step follows from, e.g., [22, Theorem 8.8].

Hence, provided that R satisfies (65), the contribution to the expected distortion from this second case can also be neglected.

The last case is wherex^∆ satisfies (53) and where one of the first 2^{T R} randomly-generated codewords covers x^∆. As we have shown, when R satisfies (65), the overall expected distortion is dominated by the expected distortion in this last case. Fix any x^∆ satisfying (53). Recalling that the encoder selects the smallest covering set that covers x^∆, and since the minimum is upper-bounded by the average, the expected distortion associated with x^∆ and the randomly chosen codebook, conditional on the chosen covering set not being the all-one set, is upper-bounded by the expected size of a randomly generated covering set conditional on it covering x^∆. This upper bound is easy to compute: conditional on a covering setXˆ^∆ that was randomly chosen as above covering x^∆, its symbolsXˆ_i^∆ are independently distributed according to

PXˆ_i^∆(1) =

(1, x^∆_i = 1

D_i, x^∆_i = 0. (66) The conditional expectation of the distortion associated with a random covering set conditional on it coveringx^∆ is

∆ T

T /∆

X

i=1

Eh Xˆ_i^∆i

= ∆ T





T /∆

X

i=1

Di(1−x^∆_i ) +

T /∆

X

i=1

x^∆_i



 (67)

≤ ∆ T





T /∆

X

i=1

D_i+

T /∆

X

i=1

x^∆_i



 (68)

= Z 1

0

D(t)˜ dt+∆ T

T /∆

X

i=1

x^∆_i . (69) Hence the expected distortion in the last case is upper bounded by

Z 1 0

D(t)˜ dt+∆ T

T /∆

X

i=1

E X_i^∆

X^∆satisfies (53)

. (70) The second term in (70) vanishes as∆tends to zero because

T /∆

X

i=1

E X_i^∆

≤T Z 1

0

λ(t)dt (71)

for all ∆ >0, and because the probability that X^∆ satisfies (53) is bounded away from zero for all fixed. We have hence proved that, if R satisfies (65), then the expected distortion for our randomly generated codebook can be made arbitrarily close to R1

0 D(t)˜ dt. It then follows that there must exist a deterministic codebook that achieves this distortion, and the achievability part of (48) is established.