Coding Theorems for Empirical Coordination

(1)

HAL Id: hal-01865569

https://hal.archives-ouvertes.fr/hal-01865569

Preprint submitted on 30 Jan 2020

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Mael Le Treust

To cite this version:

Mael Le Treust. Coding Theorems for Empirical Coordination: Technical Report. 2015. �hal- 01865569�

(2)

Coding Theorems for Empirical Coordination – Technical Report

Maël Le Treust

ETIS, UMR 8051 / ENSEA, Université Cergy-Pontoise, CNRS, 6, avenue du Ponceau,

95014 CERGY-PONTOISE CEDEX, FRANCE

Email: [email protected]

C

ONTENTS

I Non-Causal Encoding and Decoding 3

I-A Achievability Proof . . . . 5 I-B Converse Proof . . . . 6

II Perfect Channel 8

II-A Achievability Proof . . . . 9 II-B Converse Proof . . . 11

III Lossless Decoding 14

III-A Achievability Proof . . . 16 III-B Converse Proof . . . 18

IV Separation between Source and Channel 22

IV-A Achievability Proof . . . 24 IV-B Converse Proof . . . 26

V Causal Decoding 30

V-A Achievability Proof . . . 32

(3)

VI Causal Encoding 38 VI-A Achievability Proof . . . 40 VI-B Converse Proof . . . 46

VII Strictly Causal Decoding 49

VII-A Achievability Proof . . . 51 VII-B Converse Proof . . . 54

VIIIStrictly Causal Encoding 56

VIII-AAchievability Proof . . . 58 VIII-BConverse Proof . . . 60

Appendix 65 References 76

(4)

I. N

ON

-C

AUSAL

E

NCODING AND

D

ECODING

U

ⁿ

S

ⁿ

Z

ⁿ

X

ⁿ

Y

ⁿ

V

ⁿ

P

_usz

C T D

Fig. 1. Non-Causal Encoding functionf:Uⁿ× Sⁿ→ Xⁿand Decoding functiong:Yⁿ× Zⁿ→ Vⁿ.

Theorem I.1 (Non-Causal Encoding and Decoding)

1) Joint probability distribution Q(u, s, z, x, y, v) is achievable if and only if it decomposes as follows:











Q(u, s, z) = P

usz

(u, s, z), Q(y|x, s) = T (y|x, s), Y − − (X, S) − − (U, Z), Z − − (U, S ) − − (X, Y ),

(1)

and P

usz

(u, s, z) ⊗ Q(x|u, s) ⊗ T (y|x, s) ⊗ Q(v|u, s, z, x, y) is achievable.

2) Joint probability distribution P

usz

(u, s, z) ⊗ Q(x|u, s) ⊗ T (y|x, s) ⊗ Q(v|u, s, z, x, y) is achievable if:

max

Q∈Q

I(W

₁

, W

₂

; Y, Z) − I(W

₁

, W

₂

; U, S)

> 0, (2)

3) Joint probability distribution P

usz

(u, s, z) ⊗ Q(x|u, s) ⊗ T (y|x, s) ⊗ Q(v|u, s, z, x, y) is not achievable if:

max

Q∈Q

I(W

₁

; Y, Z|W

₂

) − I(W

₂

; U, S |W

₁

)

< 0, (3)

where

Q

is the set of distributions Q ∈ ∆(U × S × Z × W

₁

× W

₂

× X × Y × V) with auxiliary random variables (W

₁

, W

₂

) that satisfies:









 P

(w1,w2)∈W1×W2Q(u, s, z, w1, w2, x, y, v)

=Pusz(u, s, z)⊗ Q(x|u, s)⊗ T(y|x, s)⊗ Q(v|u, s, z, x, y), Y −−(X, S)−−(U, Z, W1, W2),

Z−−(U, S)−−(X, Y, W1, W2), V −−(Y, Z, W1, W2)−−(U, S, X).

(5)

The probability distribution Q ∈

Q

decomposes as follows:

P

usz

(u, s, z) ⊗ Q(x|u, s) ⊗ Q(w

₁

, w

₂

|u, s, x) ⊗ T (y|x, s) ⊗ Q(v|y, z, w

₁

, w

₂

).

The supports of the auxiliary random variables (W

₁

, W

₂

) are bounded by max(|W

₁

|, |W

₂

|) ≤ (|B| + 1) · (|B| + 2) with B = U × S × Z × X × Y × V.

Remark I.2 The mutual informations in equation (2) are continuous over the set of probability distributions

Q. Moreover, Q

is compact since the supports of the auxiliary random variables (W

₁

, W

₂

) are finite and

Q

has equality constraints. As mentioned in [8] pp. 7083 and in [2] pp. 9, we can consider the maximum instead of the supremum in equation (2).

Remark I.3 The achievability result of Theorem I.1 without state informations S and Z was already stated in [1], with a unique auxiliary random variable W = (W

₁

, W

₂

).

Remark I.4 As mentioned in Theorem I.1 1), probability distribution Q(u, s, z, x, y, v) should satisfy

the marginal distributions over the source P

usz

(u, s, z), the channel T (y|x, s) and the Markov chains

representing the network topology. If a probability distribution Q(u, s, z, x, y, v) does not decomposes

with P

usz

(u, s, z) ⊗ Q(x|u, s) ⊗ T (y|x, s) ⊗ Q(v|u, s, z, x, y), then the error probability can not converge

to zero and this probability distribution is not achievable. This remark is valid for all coding theorems

presented in this document.

(6)

A. Achievability Proof

Denote by Q(u, s, z, x, w

₁

, w

₂

, y, v) ∈

Q

the joint probability distribution that achieves the maximum in equation (2). There exists δ > 0 and rate

R

≥

0

such that :

R

≥ I (W

₁

, W

₂

; U, S) + δ, (4)

R

≤ I (W

1

, W

2

; Y, Z) − δ. (5)

• Random codebook. We generate |M| = 2

^nR

pairs of sequences (W

₁ⁿ

(m), W

₂ⁿ

(m)) with index m ∈ M drawn from the i.i.d. marginal probability distribution Q

^⊗n_w₁_w₂

.

• Encoding function. The encoder observes the sequences of source symbols U

ⁿ

∈ U

ⁿ

and state symbols S

ⁿ

∈ S

ⁿ

. It finds the index m ∈ M such that the sequences (U

ⁿ

, S

ⁿ

, W

₁ⁿ

(m), W

₂ⁿ

(m)) ∈ A

^⋆n_ε

(Q) are jointly typical. Encoder sends the sequence X

ⁿ

drawn from the conditional probability distribution Q

^⊗n_x|usw

1w2

depending on sequences (U

ⁿ

, S

ⁿ

, W

₁ⁿ

(m), W

₂ⁿ

(m)).

• Decoding function. The decoder observes the pair of sequences (Y

ⁿ

, Z

ⁿ

) and finds the index m ∈ M such that the sequences (Y

ⁿ

, Z

ⁿ

, W

₁ⁿ

(m), W

₂ⁿ

(m)) ∈ A

^⋆n_ε

(Q) are jointly typical. Decoder returns the sequence V

ⁿ

drawn from the conditional probability distribution Q

^⊗n_v|yzw

1w2

depending on sequences (Y

ⁿ

, Z

ⁿ

, W

₁ⁿ

(m), W

₂ⁿ

(m)).

From the properties of typical sequences, packing and covering Lemmas stated in [9] pp. 27, 46 and 208, equations (4), (5), imply there exists a n ¯ ∈

N

such that the expected probability of error events are bounded by ε for all n ≥ ¯ n:

E_c

P

(U

ⁿ

, S

ⁿ

) ∈ / A

^⋆n_ε

(Q)

≤ ε, (6)

E_c

P

∀m ∈ M, (U

ⁿ

, S

ⁿ

, W

₁ⁿ

(m), W

₂ⁿ

(m)) ∈ / A

^⋆n_ε

(Q)

≤ ε, (7)

E_c

P

∃m

^′

6= m, s.t. (Y

ⁿ

, Z

ⁿ

, W

₁ⁿ

(m

^′

), W

₂ⁿ

(m

^′

)) ∈ A

^⋆n_ε

(Q)

≤ ε. (8)

For all n ≥ ¯ n, there exists a code c

^⋆

∈ C(n) such that sequences (U

ⁿ

, S

ⁿ

, Z

ⁿ

, W

₁ⁿ

(m), W

₂ⁿ

(m), X

ⁿ

, Y

ⁿ

, V

ⁿ

) ∈ A

^⋆n_ε

(Q) are jointly typical for distribution P

usz

(u, s, z) ⊗Q(x, w

₁

, w

₂

|u, s)⊗T (y|x, s) ⊗Q(v|y, z, w

₁

, w

₂

)

with probability more than 1 − 3ε.

The cardinality bound is stated in Lemma 6 in the Appendix. This concludes the achievability proof

(7)

B. Converse Proof

We introduce the random event of error E ∈ {0, 1} defined as follows:

E =

(

0 if Q

ⁿ

− Q

tv

≤ ε ⇐⇒ (U

ⁿ

, S

ⁿ

, Z

ⁿ

, X

ⁿ

, Y

ⁿ

, V

ⁿ

) ∈ A

^⋆n_ε

(Q), 1 if Q

ⁿ

− Q

tv

> ε ⇐⇒ (U

ⁿ

, S

ⁿ

, Z

ⁿ

, X

ⁿ

, Y

ⁿ

, V

ⁿ

) ∈ / A

^⋆n_ε

(Q).

(9) Consider a sequence of code c(n) ∈ C that achieves the probability distribution Q(u, s, z, x, y, v), i.e. for which the probability of error P

e

(c) = P (E = 1) is small. We have equations:

0 =

Xn

i=1

I (U

_i+1ⁿ

, S

_i+1ⁿ

, Y

_i+1ⁿ

, Z

_i+1ⁿ

; Y

i

, Z

i

|Y

ⁱ⁻¹

, Z

ⁱ⁻¹

)

−

Xn

i=1

I (Y

ⁱ⁻¹

, Z

ⁱ⁻¹

; U

i

, S

i

, Y

i

, Z

i

|U

_i+1ⁿ

, S

_i+1ⁿ

, Y

_i+1ⁿ

, Z

_i+1ⁿ

) (10)

≤

Xn

i=1

I (U

_i+1ⁿ

, S

_i+1ⁿ

, Y

_i+1ⁿ

, Z

_i+1ⁿ

; Y

_i

, Z

_i

|Y

ⁱ⁻¹

, Z

ⁱ⁻¹

)

−

Xn

i=1

I (Y

ⁱ⁻¹

, Z

ⁱ⁻¹

; U

_i

, S

_i

|U

_i+1ⁿ

, S

_i+1ⁿ

, Y

_i+1ⁿ

, Z

_i+1ⁿ

) (11)

=

Xn

i=1

I (W

1,i

; Y

i

, Z

i

|W

2,i

) −

Xn

i=1

I (W

2,i

; U

i

, S

i

|W

1,i

). (12) Equation (10) comes from Csiszár Sum Identity stated pp. 25 in [9].

Equation (11) comes from the properties of the mutual information.

Equation (12) comes from the introduction of the auxiliary random variables W

_1,i

= (U

_i+1ⁿ

, S

_i+1ⁿ

, Y

_i+1ⁿ

, Z

_i+1ⁿ

) and W

_2,i

= (Y

ⁱ⁻¹

, Z

ⁱ⁻¹

). The pair of random variables (W

_1,i

, W

_2,i

) satisfy the three Markov Chains that correspond to the set of probability distributions

Q:

Y

_i

− − (X

_i

, S

_i

) − − (U

_i

, Z

_i

, W

_1,i

, W

_2,i

), (13) Z

i

− − (U

i

, S

i

) − − (X

i

, Y

i

, W

_1,i

, W

_2,i

), (14) V

_i

− − (Y

_i

, Z

_i

, W

_1,i

, W

_2,i

) − − (U

_i

, S

_i

, X

_i

). (15)

• The first Markov chain comes from the memoryless property of the channel and the fact that Y

_i

does not belong to (W

_1,i

, W

_2,i

).

• The second Markov chain comes from the i.i.d. property of the source and the fact that Z

i

does not belong to (W

_1,i

, W

_2,i

).

• The third Markov chain comes from the non-causal decoding: V

_i

is a function of the pair (Y

ⁿ

, Z

ⁿ

)

that is included in (Y

i

, Z

i

, W

1,i

, W

2,i

).

(8)

0 ≤

Xn

i=1

I (W

_1,i

; Y

i

, Z

i

|W

_2,i

) −

Xn

i=1

I (W

_2,i

; U

i

, S

i

|W

_1,i

)

≤ n ·

I (W

_1,T

; Y

_T

, Z

_T

|W

_2,T

, T ) − I(W

_2,T

; U

_T

, S

_T

|W

_1,T

, T )

(16)

≤ n ·

I (W

_1,T

, T ; Y

_T

, Z

_T

|W

_2,T

) − I(W

_2,T

; U

_T

, S

_T

|W

_1,T

, T )

(17)

≤ n · max

Q∈Q

I(W

₁

; Y

_T

, Z

_T

|W

₂

) − I (W

₂

; U

_T

, S

_T

|W

₁

)

(18)

≤ n · max

Q∈Q

I(W

₁

; Y

_T

, Z

_T

|W

₂

, E = 0) − I(W

₂

; U

_T

, S

_T

|W

₁

, E = 0) + ε

(19)

≤ n · max

Q∈Q

I(W

₁

; Y, Z|W

₂

) − I(W

₂

; U, S |W

₁

) + 2ε

. (20)

Equation (16) comes from the introduction of the uniform random variable T over {1, . . . , n} and the introduction of the corresponding mean random variables U

_T

, S

_T

, Z

_T

, W

_1,T

, W

_2,T

, X

_T

, Y

_T

, Z

_T

. Equation (17) comes from the properties of the mutual information.

Equation (18) comes from identifying W

₁

with (W

_1,T

, T ) and W

₂

with W

_2,T

and taking the maximum over the probability distributions Q that belong to

Q. This is possible since the random variables

(W

_1,T

, T ) and W

_2,T

satisfy the Markov chains of the set of probability distributions

Q, as stated in Lemma 7 in

the Appendix.

Equation (19) comes from the empirical coordination requirement as stated in Lemma 8. Sequences are not jointly typical with small error probability P(E = 1).

Equation (20) comes from Lemma 9 that states that the probability distribution induced by the coding scheme P (U

T

, S

T

, Z

T

, X

T

, Y

T

, V

T

) = (u, s, z, x, y, v)

E = 0

is closed to the target probability

distribution Q(u, s, z, x, y, v). The continuity of the entropy function stated pp. 33 in [10] concludes

the converse proof of Theorem I.1.

(9)

II. P

ERFECT

C

HANNEL

U

ⁿ

S

ⁿ

Z

ⁿ

X

ⁿ

V

ⁿ

P

_usz

C D

Fig. 2. The perfect channel is defined byT_y|xs=

1

^(y|x)and the decoding is lossy.

Theorem II.1 (Perfect Channel)

1) The joint probability distribution Q(u, s, z, x, v) is achievable if and only if it decomposes as follows:







Q(u, s, z) = P

usz

(u, s, z), Z − − (U, S ) − − X,

(21) and P

usz

(u, s, z) ⊗ Q(x|u, s) ⊗ Q(v|u, s, z, x) is achievable.

2) The probability distribution P

usz

(u, s, z) ⊗ Q(x|u, s) ⊗ Q(v|u, s, z, x) is achievable if:

Q∈Q

max

p

I (W

₂

; Z|X) + H(X) − I (X, W

₂

; U, S )

> 0, (22)

3) The probability distribution P

usz

(u, s, z) ⊗ Q(x|u, s) ⊗ Q(v|u, s, z, x) is not achievable if:

Q∈Q

max

p

I (W

₂

; Z|X) + H(X) − I (X, W

₂

; U, S )

< 0, (23)

where

Q_p

is the set of distributions Q ∈ ∆(U × S × Z × W

₂

× X × V) with auxiliary random variable W

2

that satisfies:









 P

w2∈W2Q(u, s, z, x, w2, v)

=Pusz(u, s, z)⊗ Q(x|u, s)⊗ Q(v|u, s, z, x), Z−−(U, S)−−(X, W2),

V −−(X, Z, W2)−−(U, S).

The probability distribution Q ∈

Q_p

decomposes as follows:

Q(u, s, z, x, w

₂

, v) = P

_usz

(u, s, z) ⊗ Q(x|u, s) ⊗ Q(w

₂

|u, s, x) ⊗ Q(v|x, z, w

₂

).

The support of the auxiliary random variable W

2

is bounded by |W

2

| ≤ |B| + 1 with B = U × S × Z × X × Y × V .

Remark II.2 This result generalizes the coding theorem of Wyner Ziv stated in [2].

(10)

A. Achievability Proof

This proof can be obtained from achievability result of Sec. I-A, by replacing random variables W

₁

and Y by X. Note that I (W

₂

; Z|X) + H(X) − I(X, W

₂

; U, S ) = I (X, W

₂

; X, Z) − I(X, W

₂

; U, S).

Denote by Q(u, s, z, x, w

₂

, v) ∈

Qp

the joint probability distribution that achieves the maximum in equation (22). There exists δ > 0 and rate

R

≥

0

such that :

R

≥ I(X, W

₂

; U, S) + δ, (24)

R

≤ I(X, W

₂

; X, Z) − δ. (25)

• Random codebook. We generate |M| = 2

^nR

pairs of sequences (X

ⁿ

(m), W

₂ⁿ

(m)) with index m ∈ M drawn from the i.i.d. marginal probability distribution Q

^⊗n_xw₂

.

• Encoding function. The encoder observes the sequences of source symbols U

ⁿ

∈ U

ⁿ

and state symbols S

ⁿ

∈ S

ⁿ

. It finds the index m ∈ M such that the sequences (U

ⁿ

, S

ⁿ

, X

ⁿ

(m), W

₂ⁿ

(m)) ∈ A

^⋆n_ε

(Q) are jointly typical. Encoder sends the corresponding sequence X

ⁿ

(m) through the channel.

• Decoding function. The decoder observes the pair of sequences (X

ⁿ

, Z

ⁿ

) and finds the index m ∈ M such that the sequences (X

ⁿ

, Z

ⁿ

, X

ⁿ

(m), W

₂ⁿ

(m)) ∈ A

^⋆n_ε

(Q) are jointly typical (for probability distribution Q

xzw2

⊗ 1

x|x

). Decoder returns the sequence V

ⁿ

drawn from the conditional probability distribution Q

^⊗n_v|xzw

2

depending on sequences (X

ⁿ

, Z

ⁿ

, W

₂ⁿ

(m)).

From the properties of typical sequences, packing and covering Lemmas stated in [9] pp. 27, 46 and 208, equations (24), (25), imply there exists a n ¯ ∈

N

such that the expected probability of error events are bounded by ε for all n ≥ n: ¯

E_c

P

(U

ⁿ

, S

ⁿ

) ∈ / A

^⋆n_ε

(Q)

≤ ε, (26)

E_c

P

∀m ∈ M, (U

ⁿ

, S

ⁿ

, X

ⁿ

(m), W

₂ⁿ

(m)) ∈ / A

^⋆n_ε

(Q)

≤ ε, (27)

E_c

P

∃m

^′

6= m, s.t. (X

ⁿ

, Z

ⁿ

, X

ⁿ

(m

^′

), W

₂ⁿ

(m

^′

)) ∈ A

^⋆n_ε

(Q)

≤ ε. (28)

For all n ≥ n, there exists a code ¯ c

^⋆

∈ C(n) such that sequences (U

ⁿ

, S

ⁿ

, Z

ⁿ

, X

ⁿ

(m), W

₂ⁿ

(m), V

ⁿ

) ∈

A

^⋆n_ε

(Q) are jointly typical for distribution P

usz

(u, s, z) ⊗ Q(x, w

₂

|u, s) ⊗ Q(v|x, z, w

₂

) with probability

more than 1 − 3ε.

(11)

The cardinality bound is stated in Lemma 6 in the Appendix. This concludes the achievability proof

of Theorem II.1.

(12)

B. Converse Proof

We introduce the random event of error E ∈ {0, 1} defined as follows:

E =

(

0 if Q

ⁿ

− Q

tv

≤ ε ⇐⇒ (U

ⁿ

, S

ⁿ

, Z

ⁿ

, X

ⁿ

, V

ⁿ

) ∈ A

^⋆n_ε

(Q), 1 if Q

ⁿ

− Q

tv

> ε ⇐⇒ (U

ⁿ

, S

ⁿ

, Z

ⁿ

, X

ⁿ

, V

ⁿ

) ∈ / A

^⋆n_ε

(Q).

(29) Consider a sequence of code c(n) ∈ C that achieves the probability distribution Q(u, s, z, x, v), i.e. for which the probability of error P

e

(c) = P (E = 1) is small. We have equations:

0 = H(X

ⁿ

, Z

ⁿ

|E = 0) − I (X

ⁿ

, Z

ⁿ

; U

ⁿ

, S

ⁿ

|E = 0) − H(X

ⁿ

, Z

ⁿ

|U

ⁿ

, S

ⁿ

, E = 0) (30)

≤

Xn

i=1

H(X

_i

, Z

_i

|E = 0) −

Xn

i=1

I(X

ⁿ

, Z

ⁿ

; U

_i

, S

_i

|U

_i+1ⁿ

, S

_i+1ⁿ

, E = 0)

− H(X

ⁿ

, Z

ⁿ

|U

ⁿ

, S

ⁿ

, E = 0) (31)

=

Xn

i=1

I(X

ⁿ

, Z

ⁿ

, U

_i+1ⁿ

, S

_i+1ⁿ

; X

i

, Z

i

|E = 0) −

Xn

i=1

I (X

ⁿ

, Z

ⁿ

, U

_i+1ⁿ

, S

ⁿ_i+1

; U

i

, S

i

|E = 0)

− H(X

ⁿ

, Z

ⁿ

|U

ⁿ

, S

ⁿ

, E = 0) + n · ε (32)

=

Xn

i=1

I(X

ⁿ

, Z

⁻ⁱ

, U

_i+1ⁿ

, S

_i+1ⁿ

; X

i

, Z

i

|E = 0) −

Xn

i=1

I(X

ⁿ

, Z

⁻ⁱ

, U

_i+1ⁿ

, S

_i+1ⁿ

; U

i

, S

i

|E = 0) +

Xn

i=1

I(Z

i

; X

i

, Z

i

|X

ⁿ

, Z

⁻ⁱ

, U

_i+1ⁿ

, S

_i+1ⁿ

, E = 0) −

Xn

i=1

I(Z

i

; U

i

, S

i

|X

ⁿ

, Z

⁻ⁱ

, U

_i+1ⁿ

, S

_i+1ⁿ

, E = 0)

− H(X

ⁿ

, Z

ⁿ

|U

ⁿ

, S

ⁿ

, E = 0) + n · ε (33)

=

Xn

i=1

I(X

ⁿ

, Z

⁻ⁱ

, U

_i+1ⁿ

, S

_i+1ⁿ

; X

i

, Z

i

|E = 0) −

Xn

i=1

I(X

ⁿ

, Z

⁻ⁱ

, U

_i+1ⁿ

, S

_i+1ⁿ

; U

i

, S

i

|E = 0) +

Xn

i=1

H(Z

i

|X

ⁿ

, Z

⁻ⁱ

, U

_i+1ⁿ

, S

_i+1ⁿ

, U

i

, S

i

, E = 0) − H(X

ⁿ

, Z

ⁿ

|U

ⁿ

, S

ⁿ

, E = 0) + n · ε (34)

≤

Xn

i=1

I(X

ⁿ

, Z

⁻ⁱ

, U

_i+1ⁿ

, S

_i+1ⁿ

; X

i

, Z

i

|E = 0) −

Xn

i=1

I(X

ⁿ

, Z

⁻ⁱ

, U

_i+1ⁿ

, S

_i+1ⁿ

; U

i

, S

i

|E = 0) +

Xn

i=1

H(Z

i

|U

i

, S

i

, E = 0) − H(Z

ⁿ

|U

ⁿ

, S

ⁿ

, E = 0) + n · 2ε (35)

≤

Xn

i=1

I(X

ⁿ

, Z

⁻ⁱ

, U

_i+1ⁿ

, S

_i+1ⁿ

; X

_i

, Z

_i

|E = 0) −

Xn

i=1

I(X

ⁿ

, Z

⁻ⁱ

, U

_i+1ⁿ

, S

_i+1ⁿ

; U

_i

, S

_i

|E = 0) + n · 3ε (36)

=

Xn

i=1

I(X

i

, W

_2,i

; X

i

, Z

i

|E = 0) − I(X

i

, W

_2,i

; U

i

, S

i

|E = 0) + n · 3ε. (37)

Equations (30) and (31) come from the properties of the mutual information.

(13)

that implies

Pn

i=1

I(U

_i+1ⁿ

, S

_i+1ⁿ

; U

i

, S

i

|E = 0) ≤ n · ε.

Equations (33) and (34) come from the properties of the mutual information.

Equation (35) comes from the i.i.d. property of the information source (U, S, Z) and Lemma 10 in the Ap- pendix that implies

Pn

i=1

H(Z

i

|X

ⁿ

, Z

⁻ⁱ

, U

_i+1ⁿ

, S

_i+1ⁿ

, U

i

, S

i

, E = 0)−

Pn

i=1

H(Z

i

|U

i

, S

i

, E = 0) ≤ n ·ε.

Equation (36) comes from the i.i.d. property of the information source (U, S, Z) and Lemma 10 in the Appendix that implies

Pn

i=1

H(Z

i

|U

i

, S

i

, E = 0) − H(Z

ⁿ

|U

ⁿ

, S

ⁿ

, E = 0) ≤ n · ε.

Equation (37) comes from the introduction of the auxiliary random variable W

_2,i

= (X

⁻ⁱ

, Z

⁻ⁱ

, U

_i+1ⁿ

, S

_i+1ⁿ

). The random variable W

_2,i

satisfy the Markov Chains that correspond to the set of probability distributions

Qp

corresponding to the perfect channel:

Z

_i

− − (U

_i

, S

_i

) − − (X

_i

, W

_2,i

), (38) V

i

− − (X

i

, Z

i

, W

_2,i

) − − (U

i

, S

i

). (39)

• The first Markov chain comes from i.i.d. property of the source and the fact that Z

i

does not belong to W

_2,i

.

• The second Markov chain comes from the non-causal decoding: V

i

is a function of (X

ⁿ

, Z

ⁿ

) that is included in (X

i

, Z

i

, W

_2,i

) = (X

i

, Z

i

, X

⁻ⁱ

, Z

⁻ⁱ

, U

_i+1ⁿ

, S

_i+1ⁿ

).

0 ≤

Xn

i=1

I(X

_i

, W

_2,i

; X

_i

, Z

_i

|E = 0) − I (X

_i

, W

_2,i

; U

_i

, S

_i

|E = 0) + 3nε

= n ·

I (X

T

, W

_2,T

; X

T

, Z

T

|T, E = 0) − I (X

T

, W

_2,T

; U

T

, S

T

|T, E = 0) + 3ε

(40)

≤ n ·

I (X

T

, W

_2,T

, T ; X

T

, Z

T

|E = 0) − I (X

T

, W

_2,T

, T ; U

T

, S

T

|E = 0) + 3ε

(41)

≤ n · max

Q∈Qp

I (X

T

, W

₂

; X

T

, Z

T

|E = 0) − I(X

T

, W

₂

; U

T

, S

T

|E = 0) + 3ε

(42)

≤ n · max

Q∈Qp

I (X, W

₂

; X, Z ) − I (X, W

₂

; U, S) + 4ε

. (43)

Equation (40) comes from the introduction of the uniform random variable T over {1, . . . , n} and the introduction of the corresponding mean random variables U

T

, S

T

, Z

T

, W

_2,T

, X

T

, V

T

.

Equation (41) comes from the i.i.d. property of the information source (U, S) and Lemma 11 in the Appendix that implies I (T ; U

T

, S

T

|E = 0) = 0.

Equation (42) comes from identifying W

₂

with (W

_2,T

, T ) and taking the maximum over the probability

distributions that belong to

Qp

. This is possible since the random variables (W

_2,T

, T ) satisfy the Markov

chains of the set of probability distributions

Q_p

, as stated in Lemma 7 in the Appendix.

(14)

Equation (43) comes from Lemma 9 that states that the probability distribution induced by the coding scheme P (U

T

, S

T

, Z

T

, X

T

, V

T

) = (u, s, z, x, v)

E = 0

is closed to the target probability distribution

Q(u, s, z, x, y, v). The continuity of the entropy function stated pp. 33 in [10] concludes the converse

proof of Theorem II.1.

(15)

III. L

OSSLESS

D

ECODING

U

ⁿ

S

ⁿ

Z

ⁿ

X

ⁿ

Y

ⁿ

U ˆ

ⁿ

P

usz

C T D

Fig. 3. Noisy channelT(y|x, s)and lossless decoding

1

^(ˆ^u|u).

Theorem III.1 (Lossless Decoding)

1) The joint probability distribution Q(u, s, z, x, y, u) ˆ is achievable if and only if it decomposes as follows:











Q(u, s, z) = P

_usz

(u, s, z), Q(y|x, s) = T (y|x, s), Q(ˆ u|u) = 1 ^(ˆ ^u|u), Y − − (X, S) − − (U, Z), Z − − (U, S ) − − (X, Y ),

(44)

and P

_usz

(u, s, z) ⊗ Q(x|u, s) ⊗ T (y|x, s) ⊗ 1 ^(ˆ ^u|u) is achievable.

2) The probability distribution P

_usz

(u, s, z) ⊗ Q(x|u, s) ⊗ T (y|x, s) ⊗ 1 ^(ˆ ^u|u) is achievable if:

max

Q∈Ql

I(U, W

₁

; Y, Z) − I (W

₁

; S|U ) − H(U )

> 0, (45)

3) The probability distribution P

_usz

(u, s, z) ⊗ Q(x|u, s) ⊗ T (y|x, s) ⊗ 1 ^(ˆ ^u|u) is not achievable if:

max

Q∈Ql

I(U, W

₁

; Y, Z) − I (W

₁

; S|U ) − H(U )

< 0, (46)

where

Q_l

is the set of distributions Q ∈ ∆(U × S × Z × W

1

× X × Y × U) with auxiliary random variable W

1

that satisfies:









 P

w1∈W1Q(u, s, z, w1, x, y,ˆu)

=Pusz(u, s, z)⊗ Q(x|u, s)⊗ T(y|x, s)⊗

1

^(ˆ^u|u),

Y −−(X, S)−−(U, Z, W1), Z−−(U, S)−−(X, Y, W1).

The probability distribution Q ∈

Q_l

decomposes as follows:

Q(u, s, z, w

1

, x, y, u) = ˆ P

usz

(u, s, z) ⊗ Q(x|u, s) ⊗ Q(w

1

|u, s, x) ⊗ T (y|x, s) ⊗ Q(ˆ u|u).

(16)

The support of the auxiliary random variables W

₁

is bounded by |W

₁

| ≤ |B| + 1 with B = U × S × Z × X × Y × V .

Remark III.2 This result was already stated in [4] and [5] with a more restrictive lossless decoding

constraint: P ( ˆ U

ⁿ

6= U

ⁿ

) ≤ ε. It generalizes the coding theorem of Gel’fand Pinsker stated in [3].

(17)

A. Achievability Proof

This proof can be obtained from the achievability result of Sec. I-A, by replacing random variables W

₂

and V by U . Note that I(U, W

₁

; Y, Z) − I (W

₁

; S|U ) − H(U ) = I (W

₁

, U ; Y, Z) − I(W

₁

, U ; U, S ).

Denote by Q(u, s, z, w

1

, x, y, u) ˆ ∈

Q_l

the joint probability distribution that achieves the maximum in equation (45). There exists δ > 0 and rate

R

≥

0

such that :

R

≥ I (W

1

, U ; U, S) + δ, (47)

R

≤ I (W

₁

, U ; Y, Z) − δ. (48)

• Random codebook. We generate |M| = 2

^nR

pairs of sequences (W

₁ⁿ

(m), U

ⁿ

(m)) with index m ∈ M drawn from the i.i.d. marginal probability distribution Q

^⊗n_w₁_u

.

• Encoding function. The encoder observes the sequences of source symbols U

ⁿ

∈ U

ⁿ

and state symbols S

ⁿ

∈ S

ⁿ

. It finds the index m ∈ M such that the sequences (U

ⁿ

, S

ⁿ

, W

₁ⁿ

(m), U

ⁿ

(m)) ∈ A

^⋆n_ε

(Q) are jointly typical (for probability distribution Q

_usw₁

⊗ 1

u|u

). Encoder sends a sequence X

ⁿ

drawn from the conditional probability distribution Q

^⊗n_x|usw

1

depending on sequences (S

ⁿ

, W

₁ⁿ

(m), U

ⁿ

(m)).

• Decoding function. The decoder observes the pair of sequences (Y

ⁿ

, Z

ⁿ

) and finds the index m ∈ M such that the sequences (Y

ⁿ

, Z

ⁿ

, W

₁ⁿ

(m), U

ⁿ

(m)) ∈ A

^⋆n_ε

(Q) are jointly typical. Decoder returns the sequence U ˆ

ⁿ

= U

ⁿ

(m).

From the properties of typical sequences, packing and covering Lemmas stated in [9] pp. 27, 46 and 208, equations (47), (48), imply there exists a n ¯ ∈

N

such that the expected probability of error events are bounded by ε for all n ≥ n: ¯

Ec

P

(U

ⁿ

, S

ⁿ

) ∈ / A

^⋆n_ε

(Q)

≤ ε, (49)

Ec

P

∀m ∈ M, (U

ⁿ

, S

ⁿ

, W

₁ⁿ

(m), U

ⁿ

(m)) ∈ / A

^⋆n_ε

(Q)

≤ ε, (50)

E_c

P

∃m

^′

6= m, s.t. (Y

ⁿ

, Z

ⁿ

, W

₁ⁿ

(m

^′

), U

ⁿ

(m

^′

)) ∈ A

^⋆n_ε

(Q)

≤ ε. (51)

For all n ≥ n, there exists a code ¯ c

^⋆

∈ C(n) such that sequences (U

ⁿ

, S

ⁿ

, Z

ⁿ

, W

₁ⁿ

(m), X

ⁿ

, Y

ⁿ

, U ˆ

ⁿ

) ∈

A

^⋆n_ε

(Q) are jointly typical for distribution P

usz

(u, s, z) ⊗ Q(x, w

₁

|u, s) ⊗ T (y|x, s) ⊗ 1 ^(ˆ ^u|u) ^with

probability more than 1 − 3ε.

(18)

The cardinality bound is stated in Lemma 6 in the Appendix. This concludes the achievability proof of Theorem III.1.

An alternative achievability proof based on superposition coding can be found in [4] and [5].

(19)

B. Converse Proof

We introduce the random event of error E ∈ {0, 1} defined as follows:

E =

(

0 if Q

ⁿ

− Q

tv

≤ ε ⇐⇒ (U

ⁿ

, S

ⁿ

, Z

ⁿ

, X

ⁿ

, Y

ⁿ

, U ˆ

ⁿ

) ∈ A

^⋆n_ε

(Q), 1 if Q

ⁿ

− Q

tv

> ε ⇐⇒ (U

ⁿ

, S

ⁿ

, Z

ⁿ

, X

ⁿ

, Y

ⁿ

, U ˆ

ⁿ

) ∈ / A

^⋆n_ε

(Q).

(52) Consider a sequence of code c(n) ∈ C that achieves the probability distribution Q(u, s, z, x, y, u), ˆ i.e.

for which the probability of error P

e

(c) = P (E = 1) is small. We have equations:

n · H(U ) = H(U

ⁿ

) (53)

= H(E) + H(U

ⁿ

|E) − H(E|U

ⁿ

) (54)

≤ 1 + P(E = 0) · H(U

ⁿ

|E = 0) + P(E = 1) · H(U

ⁿ

|E = 1) (55)

≤ 1 + H(U

ⁿ

|E = 0) + P (E = 1) · n · log

₂

|U | (56)

≤ H(U

ⁿ

|E = 0) + n · ε. (57)

Equation (53) comes from the i.i.d. property of the source.

Equations (54), (55) and (56) comes from the properties of the mutual information.

Equation (57) comes from the hypothesis of small error probability P

e

(c) = P (E = 1) and large length

n ∈

N

of the codewords, hence

_n¹

+ P(E = 1) · log

₂

|U | ≤ ε.

(20)

H(U

ⁿ

|E = 0) = I(U

ⁿ

; Y

ⁿ

, Z

ⁿ

|E = 0) + H(U

ⁿ

|Y

ⁿ

, Z

ⁿ

, E = 0) (58)

= I(U

ⁿ

; Y

ⁿ

, Z

ⁿ

|E = 0) + n · ε (59)

=

Xn

i=1

I(U

ⁿ

; Y

_i

, Z

_i

|Y

ⁱ⁻¹

, Z

ⁱ⁻¹

, E = 0) + n · ε (60)

≤

Xn

i=1

I(U

ⁿ

, Y

ⁱ⁻¹

, Z

ⁱ⁻¹

; Y

_i

, Z

_i

|E = 0) + n · ε (61)

=

Xn

i=1

I(U

ⁿ

, Y

ⁱ⁻¹

, Z

ⁱ⁻¹

, S

_i+1ⁿ

; Y

i

, Z

i

|E = 0)

−

Xn

i=1

I(S

_i+1ⁿ

; Y

i

, Z

i

|U

ⁿ

, Y

ⁱ⁻¹

, Z

ⁱ⁻¹

, E = 0) + n · ε (62)

=

Xn

i=1

I(U

ⁿ

, Y

ⁱ⁻¹

, Z

ⁱ⁻¹

, S

_i+1ⁿ

; Y

i

, Z

i

|E = 0)

−

Xn

i=1

I(S

i

; Y

ⁱ⁻¹

, Z

ⁱ⁻¹

|U

ⁿ

, S

_i+1ⁿ

, E = 0) + n · ε (63)

=

Xn

i=1

I(U

i

, U

⁻ⁱ

, Y

ⁱ⁻¹

, Z

ⁱ⁻¹

, S

_i+1ⁿ

; Y

i

, Z

i

|E = 0)

−

Xn

i=1

I(S

_i

; U

⁻ⁱ

, Y

ⁱ⁻¹

, Z

ⁱ⁻¹

, S

_i+1ⁿ

|U

_i

, E = 0) + n · 2ε (64)

=

Xn

i=1

I(U

i

, W

1,i

; Y

i

, Z

i

|E = 0) −

Xn

i=1

I(W

1,i

; S

i

|U

i

, E = 0) + n · 2ε. (65) Equation (58) comes from the properties of the mutual information.

Equation (59) comes from Fano’s inequality that implies H(U

ⁿ

|Y

ⁿ

, Z

ⁿ

, E = 0) ≤ n · ε, as stated in Lemma 1.

Equation (60), (61) and (62) come from the properties of the mutual information.

Equation (63) comes from Csiszár Sum Identity stated pp. 25 in [9].

Equation (64) comes from the independence of random variables (U

i

, S

i

) with (U

⁻ⁱ

, S

_i+1ⁿ

) that implies that

Pn

i=1

I(S

i

; U

⁻ⁱ

, S

_i+1ⁿ

|U

i

, E = 0) ≤ n · ε, as stated in Lemma 10 in the Appendix.

Equation (65) comes from the introduction of the auxiliary random variable W

1,i

= (U

⁻ⁱ

, Y

ⁱ⁻¹

, S

_i+1ⁿ

).

The Markov chain property Y

_i

− − (X

_i

, S

_i

) − − (U

_i

, W

_1,i

) is satisfied for all i ∈ {1, . . . , n} since the

channel is memoryless and Y

i

do not belong to W

_1,i

. The random variable W

_1,i

belongs to the set of

probability distributions

Q_l

for lossless decoding.

(21)

H(Uⁿ|E = 0) ≤ Xn i=1

I(Ui, W1,i;Yi, Zi|E= 0)− Xn i=1

I(W1,i;Si|Ui, E= 0) +n·2ε

= n·

I(UT, W1,T;YT, ZT|T, E= 0)−I(W1,T;ST|UT, T, E = 0) + 2ε

(66)

= n·

I(UT, W1,T, T;YT, ZT|E= 0)−I(W1,T, T;ST|UT, E= 0)

− I(T;YT, ZT|E= 0) +I(T;ST|UT, E= 0) + 2ε

(67)

≤ n·

I(UT, W1,T, T;YT, ZT|E= 0)−I(W1,T, T;ST|UT, E= 0) + 2ε

(68)

≤ n·max

Q∈Q

I(UT, W1;YT, ZT|E= 0)−I(W1;ST|UT, E= 0) + 2ε

(69)

≤ n·max

Q∈Q

I(U, W1;Y, Z)−I(W1;S|U) + 3ε

. (70)

Equation (66) comes from the introduction of the uniform random variable T over {1, . . . , n} and the corresponding mean random variables U

T

, S

T

, Z

T

, W

_1,T

, X

T

, Y

T

, U ˆ

T

.

Equation (67) comes from the properties of the mutual information.

Equation (68) comes from the independence of T with (U

T

, S

T

) that implies I(T ; S

T

|U

T

, E = 0) = 0, as stated in Lemma 11 in the Appendix. The memoryless property of the channel guarantees that the pair of random variable (W

_1,T

, T ) satisfies the Markov chain Y

T

− − (X

T

, S

T

) − − (U

T

, W

_1,T

, T ). Hence the pair (W

_1,T

, T ) belongs to the set of probability distributions

Q_l

, as stated in Lemma 7 in the Appendix.

Equation (69) comes from taking the maximum over the probability distributions Q that belong to

Q_l

. Equation (70) comes from Lemma 9 that states that the probability distribution induced by the coding scheme P (U

T

, S

T

, Z

T

, X

T

, Y

T

, U ˆ

T

) = (u, s, z, x, y, u) ˆ

E = 0

is closed to the target probability distribution Q(u, s, z, x, y, u). The continuity of the entropy function stated pp. 33 in [10] concludes. ˆ

Combining equations (57) and (70) gives equation (71). It is satisfied for all c(n) ∈ C that achieves the probability distribution Q(u, s, z, x, y, u). ˆ

H(U ) ≤ max

Q∈Q

I(U, W

₁

; Y, Z) − I(W

₁

; S|U ) + 4ε

. (71)

This concludes the converse proof of Theorem III.1.

(22)

Lemma 1 By Fano’s inequality, we have the following equation:

H(U

ⁿ

|Y

ⁿ

, Z

ⁿ

, E = 0) ≤ n · ε. (72)

Proof III.3 (Proof of Lemma 1) The decoding g : Y

ⁿ

× Z

ⁿ

7→ U

ⁿ

is deterministic, hence H( ˆ U

ⁿ

|Y

ⁿ

, Z

ⁿ

, E = 0) = 0. We have the following equations:

H(U

ⁿ

|Y

ⁿ

, Z

ⁿ

, E = 0) ≤ H(U

ⁿ

, U ˆ

ⁿ

|Y

ⁿ

, Z

ⁿ

, E = 0) (73)

≤ H( ˆ U

ⁿ

|Y

ⁿ

, Z

ⁿ

, E = 0) + H(U

ⁿ

| U ˆ

ⁿ

, Y

ⁿ

, Z

ⁿ

, E = 0) (74)

≤ H(U

ⁿ

| U ˆ

ⁿ

, E = 0) (75)

≤ n ·

H(U | U ˆ ) + ε

(76)

= n · ε. (77)

Equations (73) and (74) come from the properties of the entropy.

Equation (75) comes from the deterministic decoding: U ˆ

ⁿ

is a deterministic function of (Y

ⁿ

, Z

ⁿ

).

Equation (76) comes from the cardinality bound log

₂

u

ⁿ

s.t. u

ⁿ

∈ A

^⋆n_ε

(ˆ u

ⁿ

) ≤ n ·

H(U | U ˆ ) + ε , on the set of sequences u

ⁿ

that are jointly typical with u ˆ

ⁿ

.

Equation (77) comes from the target joint distribution Q(u, s, z, x, y, u) ˆ that satisfy Q(ˆ u|u) = 1 ^(ˆ ^u|u)

hence H(U | U ˆ ) = 0.

(23)

IV. S

EPARATION BETWEEN

S

OURCE AND

C

HANNEL

U

ⁿ

S

ⁿ

Z

ⁿ

X

ⁿ

Y

ⁿ

V

ⁿ

P

uz

P

s

C T D

Fig. 4. Random variables of the source(U, Z, V)are independent of the random variables of the channel(S, X, Y).

Theorem IV.1 (Separation between Source and Channel)

1) The product of probability distribution Q(u, z, v)⊗Q(s, x, y) is achievable if and only if it decomposes as follows:











Q(u, z) = P

_uz

(u, z), Q(s) = P

_s

(s),

Q(y|x, s) = T (y|x, s).

(78)

and P

uz

(u, z) ⊗ Q(v|u, z) ⊗ P

s

(s) ⊗ Q(x|s) ⊗ T (y|x, s) is achievable.

2) Joint probability distribution P

uz

(u, z) ⊗ Q(v|u, z) ⊗ P

s

(s) ⊗ Q(x|s) ⊗ T (y|x, s) is achievable if:

max

Q∈Qs

I (W

₁

; Y ) + I (W

₂

; Z) − I(W

₁

; S) − I(W

₂

; U )

> 0, (79)

3) Joint probability distribution P

uz

(u, z) ⊗ Q(v|u, z) ⊗ P

s

(s) ⊗ Q(x|s) ⊗ T (y|x, s) is not achievable if:

max

Q∈Qs

I (W

₁

; Y ) + I (W

₂

; Z) − I(W

₁

; S) − I(W

₂

; U )

< 0, (80)

where

Qs

is the set Q ∈ ∆(U × Z × W

₂

× V ) ×∆(S × W

₁

× X × Y ) of product of probability distributions with auxiliary random variables (W

₁

, W

₂

) that satisfies:









 P

(w1,w2)∈W1×W2Q(u, z, w2, v)⊗ Q(s, x, w1, y)

=Puz(u, z)⊗ Q(v|u, z)⊗ Ps(s)⊗ Q(x|s)⊗ T(y|x, s), Y −−(X, S)−−W1,

Z−−U−−W2, V −−(Z, W2)−−U.

The probability distribution Q ∈

Qs

decomposes as follows:

P

_uz

(u, z) ⊗ Q(w

₂

|u) ⊗ Q(v|z, w

₂

) ⊗ P

_s

(s) ⊗ Q(x|s) ⊗ Q(w

₁

|s, x) ⊗ T (y|x, s).

(24)

The supports of the auxiliary random variables (W

₁

, W

₂

) are bounded by max(|W

₁

|, |W

₂

|) ≤ (|B| + 1) · (|B| + 2) with B = U × S × Z × X × Y × V.

Remark IV.2 This separation result was already stated in [6] for a given distortion level and a given

channel cost.