Ergodic Theorems for Information Densities

As an application of the general theorem we prove an ergodic theorem for mutual information densities for stationary and ergodic sources. The result can be extended to AMS sources in the same manner that the results of Section 8.3 were extended to those of Section 8.4. As the stationary and ergodic result suffices for the coding theorems and the AMS conditions are messy, only the stationary case is considered here. The result is due to Barron [9].

Theorem 8.5.1: Let {Xn, Yn} be a stationary ergodic pair random pro-cess with standard alphabet. Let PXⁿYⁿ, PXⁿ, and PYⁿ denote the induced distributions and assume that for alln PXⁿ×PYⁿ >> PXⁿYⁿ and hence the information densities

in(Xⁿ;Yⁿ) = dPXⁿYⁿ

d(PXⁿ×PYⁿ)

are well defined. Assume in addition that both the {Xn} and {Yn} processes have the finite-gap information property of (8.9) and hence by the comment

8.5. ERGODIC THEOREMS FOR INFORMATION DENSITIES. 157 following Corollary 7.3.1 there is a K such that both processes satisfy the K-gap property

I(XK;X⁻|X^K)<∞, I(YK;Y⁻|Y^K)<∞. Then

nlim→∞

nin(Xⁿ;Yⁿ) = ¯I(X;Y); p−a.e..

Proof: Let Zn = (Xn, Yn). Let MXⁿ = P_X^(K)n and MYⁿ = P_Y^(K)n denote the Kth order Markov approximations of {Xn} and {Yn}, respectively. The finite-gap approximation implies as in Section 8.3 that the densities

fXⁿ = dPXⁿ

dMXⁿ

and

fYⁿ = dPYⁿ

dMYⁿ

are well defined. From Theorem 8.2.1

nlim→∞

nlnfXⁿ(Xⁿ) =H_p

X||p^(K)_X (X₀|X⁻) =I(Xk;X⁻|X^k)<∞,

nlim→∞

nlnfYⁿ(Yⁿ) =I(Yk;Y⁻|Y^k)<∞.

Define the measures MZⁿ byMXⁿ×MYⁿ. Then this is aK-step Markov source and since

MXⁿ×MYⁿ>> PXⁿ×PYⁿ

>> PXⁿ,Yⁿ =PZⁿ, the density

fZⁿ = dPZⁿ

dMZⁿ

is well defined and from Theorem 8.2.1 has a limit

nlim→∞

nlnfZⁿ(Zⁿ) =Hp||m(Z₀|Z⁻).

If the densityin(Xⁿ, Yⁿ) is infinite for anyn, then it is infinite for all larger nand convergence is trivially to the infinite information rate. If it is finite, the chain rule for densities yields

nin(Xⁿ;Yⁿ) = 1

nlnfZⁿ(Zⁿ)− 1

nlnfXⁿ(Xⁿ)−1

nlnfYⁿ(Yⁿ)

n→∞→ H_p_||_p(k)(Z0|Z⁻)−H_p_||_p(k)(X0|X⁻)−H_p_||_p(k)(Y0|Y⁻)

= ¯H_p_||_p(k)(X, Y)−H¯_p_||_p(k)(X)−H¯_p_||_p(k)(Y).

The limit is not indeterminate ( of the form∞−∞) because the two subtracted terms are finite. Since convergence is to a constant, the constant must also be the limit of the expected values ofn⁻¹in(Xⁿ, Yⁿ), that is, ¯I(X;Y). 2

Chapter 9

Channels and Codes

9.1 Introduction

We have considered a random process or source{Xn}as a sequence of random entities, where the object produced at each time could be quite general, e.g., a random variable, vector, or waveform. Hence sequences of pairs of random objects such as{Xn, Yn}are included in the general framework. We now focus on the possible interrelations between the two components of such a pair process.

In particular, we consider the situation where we begin with one source, say {Xn}, called theinput and use either a random or a deterministic mapping to form a new source {Yn}, called theoutput. We generally refer to the mapping as achannel if it is random and a code if it is deterministic. Hence a code is a special case of a channel and results for channels will immediately imply the corresponding results for codes. The initial point of interest will be conditions on the structure of the channel under which the resulting pair process{Xn, Yn} will inherit stationarity and ergodic properties from the original source{Xn}.

We will also be interested in the behavior resulting when the output of one channel serves as the input to another, that is, when we form a new channel as a cascade of other channels. Such cascades yield models of a communication system which typically has a code mapping (called the encoder) followed by a channel followed by another code mapping (called thedecoder).

A fundamental nuisance in the development is the notion of time. So far we have considered pair processes where at each unit of time, one random object is produced for each coordinate of the pair. In the channel or code example, this corresponds to one output for every input. Interesting communication systems do not always easily fit into this framework, and this can cause serious problems in notation and in the interpretation and development of results. For example, suppose that an input source consists of a sequence of real numbers and let T denote the time shift on the real sequence space. Suppose that the output source consists of a binary sequence and let S denote its shift. Suppose also that the channel is such that for each real number in, three binary symbols are

159

produced. This fits our usual framework if we consider each output variable to consist of a binary three-tuple since then there is one output vector for each input symbol. One must be careful, however, when considering the stationarity of such a system. Do we consider the output process to be physically stationary if it is stationary with respect to S or with respect to S³? The former might make more sense if we are looking at the output alone, the latter if we are looking at the output in relation to the input. How do we define stationarity for the pair process? Given two sequence spaces, we might first construct a shift on the pair sequence space as simply the cartesian product of the shifts, e.g., given an input sequencexand an output sequencey define a shiftT^∗ byT^∗(x, y) = (T x, Sy).

While this might seem natural given simply the pair random process{Xn, Yn}, it is not natural in the physical context that one symbol of X yields three symbols of Y. In other words, the two shifts do not correspond to the same amount of time. Here the more physically meaningful shift on the pair space would beT⁰(x, y) = (T x, S³y) and the more physically meaningful questions on stationarity and ergodicity relate to T⁰ and not toT^∗. The problem becomes even more complicated when channels or codes produce a varying number of output symbols for each input symbol, where the number of symbols depends on the input sequence. Such variable rate codes arise often in practice, especially for noiseless coding applications such as Huffman, Lempel-Ziv, and arithmetic codes. (See [140] for a survey of noiseless coding.) While we will not treat such variable rate systems in any detail, they point out the difficulty that can arise associating the mathematical shift operation with physical time when we are considering cartesian products of spaces, each having their own shift.

There is no easy way to solve this problem notationally. We adopt the following view as a compromise which is usually adequate for fixed-rate systems.

We will be most interested in pair processes that are stationary in the physical sense, that is, whose statistics are not changed when both are shifted by an equal amount of physical time. This is the same as stationarity with respect to the product shift if the two shifts correspond to equal amounts of physical time. Hence for simplicity we will usually focus on this case. More general cases will be introduced when appropriate to point out their form and how they can be put into the matching shift structure by considering groups of symbols and different shifts. This will necessitate occasional discussions about what is meant by stationarity or ergodicity for a particular system.

The mathematical generalization of Shannon’s original notions of sources, codes, and channels are due to Khinchine [72] [73]. Khinchine’s results char-acterizing stationarity and ergodicity of channels were corrected and developed by Adler [2].

Dans le document Entropy and Information Theory (Page 178-182)