Decoding and Shannon’s Theorem - Encoding, decoding, and Shannon’s Theorem

Team FLY

1.11 Encoding, decoding, and Shannon’s Theorem

1.11.2 Decoding and Shannon’s Theorem

✲

✟✟✟✟✟✟✟✟✟✟✯✟✟✟✟✟

❍❍❍❍

❍❍❥❍

❍❍❍❍

s s

Send Receive

0 1

1− 1−

Figure 1.2 Binary symmetric channel.

1.11.2 Decoding and Shannon’s Theorem

The process of decoding, that is, determining which codeword (and thus which messagex) was sent when a vectoryis received, is more complex. Finding efﬁcient (fast) decoding algorithms is a major area of research in coding theory because of their practical applications.

In general, encoding is easy and decoding is hard, if the code has a reasonably large size.

In order to set the stage for decoding, we begin with one possible mathematical model of a channel that transmits binary data. This model is called thebinary symmetric channel (orBSC) withcrossover probabilityand is illustrated in Figure 1.2. If 0 or 1 is sent, the probability it is received without error is 1−; if a 0 (respectively 1) is sent, the probability that a 1 (respectively 0) is received is. In most practical situationsis very small. This is an example of a discrete memoryless channel (or DMC), a channel in which inputs and outputs are discrete and the probability of error in one bit is independent of previous bits. We will assume that it is more likely that a bit is received correctly than in error; so <1/2.⁵

IfE₁andE₂are events, let prob(E₁) denote the probability thatE₁occurs and prob(E₁| E2) the probability that E1 occurs given thatE2 occurs. Assume thatc∈Fⁿ₂ is sent and y∈Fⁿ₂is received and decoded asc∈Fⁿ₂. So prob(c|y) is the probability that the codeword cis sent given thatyis received, and prob(y|c) is the probability thatyis received given that the codewordcis sent. These probabilities can be computed from the statistics associated with the channel. The probabilities are related by Bayes’ Rule

prob(c|y)=prob(y|c)prob(c) prob(y) ,

where prob(c) is the probability thatcis sent and prob(y) is the probability thatyis received.

There are two natural means by which a decoder can make a choice based on these two probabilities. First, the decoder could choosec=cfor the codewordcwith prob(c|y) maximum; such a decoder is called amaximum a posteriori probability(orMAP)decoder.

5Whileis usually very small, if >1/2, the probability that a bit is received in error is higher than the probability that it is received correctly. So one strategy is to interchange 0 and 1 immediately at the receiving end. This converts the BSC with crossover probabilityto a BSC with crossover probability 1− <1/2. This of course does not help if=1/2; in this case communication is not possible – see Exercise 77.

Symbolically, a MAP decoder makes the decision c=arg max

c∈C prob(c|y).

Here arg maxc∈Cprob(c|y) is the argumentcof the probability function prob(c|y) that maximizes this probability. Alternately, the decoder could choosec=cfor the codewordc with prob(y|c) maximum; such a decoder is called amaximum likelihood(orML)decoder.

Symbolically, a ML decoder makes the decision c=arg max

c∈C prob(y|c). (1.9)

Consider ML decoding over a BSC. Ify=y₁· · ·yn andc=c₁· · ·cn, prob(y|c)=

n i=1

prob(yi |ci),

since we assumed that bit errors are independent. By Figure 1.2, prob(yi |ci)=ifyi =ci

and prob(yi |ci)=1−ifyi =ci. Therefore prob(y|c)=^d(y^,^c)(1−)ⁿ^−d(y^,^c)=(1−)ⁿ

1−

d(y,c)

. (1.10)

Since 0< <1/2, 0< /(1−)<1. Therefore maximizing prob(y|c) is equivalent to minimizing d(y,c), that is, ﬁnding the codewordcclosest to the received vectoryin Hamming distance; this is callednearest neighbor decoding. Hence on a BSC, maximum likelihood and nearest neighbor decoding are the same.

Lete=y−cso thaty=c+e. The effect of noise in the communication channel is to add anerror vectoreto the codewordc, and the goal of decoding is to determinee. Nearest neighbor decoding is equivalent to ﬁnding a vectoreof smallest weight such thaty−eis in the code. This error vector need not be unique since there may be more than one codeword closest toy; in other words, (1.9) may not have a unique solution. When we have a decoder capable of ﬁnding all codewords nearest to the received vectory, then we have acomplete decoder.

To examine vectors closest to a given codeword, the concept of spheres about codewords proves useful. Thesphereof radiusrcentered at a vectoruinFⁿqis deﬁned to be the set Sr(u)=

v∈Fⁿq d(u,v)≤r

of all vectors whose distance fromuis less than or equal tor. The number of vectors in Sr(u) equals

r i=0

n i

(q−1)ⁱ. (1.11)

These spheres are pairwise disjoint provided their radius is chosen small enough.

Theorem 1.11.2 If d is the minimum distance of a codeC(linear or nonlinear)and t= (d−1)/2,then spheres of radius t about distinct codewords are disjoint.

41 1.11 Encoding,decoding,and Shannon’s Theorem

Proof: Ifz∈St(c₁)∩St(c₂),wherec₁andc₂are codewords, then by the triangle inequality (Theorem 1.4.1(iv)),

d(c₁,c₂)≤d(c₁,z)+d(z,c₂)≤2t<d,

implying thatc1=c2.

Corollary 1.11.3 With the notation of the previous theorem,if a codewordcis sent andy is received where t or fewer errors have occurred,thencis the unique codeword closest to y. In particular,nearest neighbor decoding uniquely and correctly decodes any received vector in which at most t errors have occurred in transmission.

Exercise 65 Prove that the number of vectors inSr(u) is given by (1.11).

For purposes of decoding as many errors as possible, this corollary implies that for given nandk, we wish to ﬁnd a code with as high a minimum weightdas possible. Alternately, givennandd, one wishes to send as many messages as possible; thus we want to ﬁnd a code with the largest number of codewords, or, in the linear case, the highest dimension.

We may relax these requirements somewhat if we can ﬁnd a code with an efﬁcient decoding algorithm.

Since the minimum distance ofCisd, there exist two distinct codewords such that the spheres of radiust+1 about them are not disjoint. Therefore if more thanterrors occur, nearest neighbor decoding may yield more than one nearest codeword. ThusCis at -error-correcting codebut not a (t+1)-error-correcting code. Thepacking radiusof a code is the largest radius of spheres centered at codewords so that the spheres are pairwise disjoint.

This discussion shows the following two facts about the packing radius.

Theorem 1.11.4 LetCbe an[n,k,d]code overFq. The following hold:

(i) The packing radius ofCequals t= (d−1)/2.

(ii) The packing radius t ofCis characterized by the property that nearest neighbor decoding always decodes correctly a received vector in which t or fewer errors have occurred but will not always decode correctly a received vector in which t+1errors have occurred.

The decoding problem now becomes one of ﬁnding an efﬁcient algorithm that will correct up tot errors. One of the most obvious decoding algorithms is to examine all codewords until one is found with distancet or less from the received vector. But obviously this is a realistic decoding algorithm only for codes with a small number of codewords. Another obvious algorithm is to make a table consisting of a nearest codeword for each of theqⁿ vectors inFⁿq and then look up a received vector in the table in order to decode it. This is impractical ifqⁿis very large.

For an [n,k,d] linear code C overFq, we can, however, devise an algorithm using a table withqⁿ⁻^krather thanqⁿentries where one can ﬁnd the nearest codeword by looking up one of theseqⁿ⁻^k entries. This general decoding algorithm for linear codes is called syndrome decoding. Because our codeCis an elementary abelian subgroup of the additive group ofFⁿq, its distinct cosetsx+CpartitionFⁿqintoqⁿ⁻^ksets of sizeq^k. Two vectorsx andybelong to the same coset if and only ify−x∈C. Theweight of a cosetis the smallest weight of a vector in the coset, and any vector of this smallest weight in the coset is called

acoset leader. The zero vector is the unique coset leader of the codeC. More generally, every coset of weight at mostt= (d−1)/2has a unique coset leader.

Exercise 66 Do the following:

(a) Prove that ifCis an [n,k,d] code overFq, every coset of weight at mostt = (d−1)/2 has a unique coset leader.

(b) Find a nonzero binary code of length 4 and minimum weightdin which all cosets have unique coset leaders and some coset has weight greater thant = (d−1)/2. Choose a parity check matrixHforC. Thesyndromeof a vectorxinFⁿqwith respect to the parity check matrixHis the vector inFⁿq⁻^kdeﬁned by

syn(x)=Hx^T.

The codeCconsists of all vectors whose syndrome equals0. As H has rankn−k, every vector inFⁿq⁻^kis a syndrome. Ifx1,x2∈Fⁿqare in the same coset ofC, thenx1−x2=c∈C.

Therefore syn(x₁)=H(x₂+c)^T=Hx^T₂+Hc^T =Hx^T₂ =syn(x₂). Hencex₁andx₂have the same syndrome. On the other hand, if syn(x1)=syn(x2), thenH(x2−x1)^T=0and so x2−x1∈C. Thus we have the following theorem.

Theorem 1.11.5 Two vectors belong to the same coset if and only if they have the same syndrome.

Hence there exists a one-to-one correspondence between cosets ofCand syndromes. We denote byCsthe coset ofCconsisting of all vectors inFⁿqwith syndromes.

Suppose a codeword sent over a communication channel is received as a vectory. Since in nearest neighbor decoding we seek a vectoreof smallest weight such thaty−e∈C, nearest neighbor decoding is equivalent to ﬁnding a vectoreof smallest weight in the coset containing y, that is, a coset leader of the coset containing y.The Syndrome Decoding Algorithmis the following implementation of nearest neighbor decoding. We begin with a ﬁxed parity check matrixH.

I. For each syndromes∈Fⁿq⁻^k, choose a coset leaderesof the cosetCs. Create a table pairing the syndrome with the coset leader.

This process can be somewhat involved, but this is a one-time preprocessing task that is carried out before received vectors are analyzed. One method of computing this table will be described shortly. After producing the table, received vectors can be decoded.

II. After receiving a vectory, compute its syndromesusing the parity check matrixH. III. yis then decoded as the codewordy−es.

Syndrome decoding requires a table with onlyqⁿ⁻^k entries, which may be a vast im-provement over a table ofqⁿvectors showing which codeword is closest to each of these.

However, there is a cost for shortening the table: before looking in the table of syndromes, one must perform a matrix-vector multiplication in order to determine the syndrome of the received vector. Then the table is used to look up the syndrome and ﬁnd the coset leader.

How do we construct the table of syndromes as described in Step I? We brieﬂy discuss this for binary codes; one can extend this easily to nonbinary codes. Given the t-error-correcting codeCof lengthnwith parity check matrixH, we can construct the syndromes as follows. The coset of weight 0 has coset leader0. Consider then cosets of weight 1.

43 1.11 Encoding,decoding,and Shannon’s Theorem

Choose ann-tuple with a 1 in positioniand 0s elsewhere; the coset leader is then-tuple and the associated syndrome is columniofH. For the (ⁿ₂) cosets of weight 2, choose ann-tuple with two 1s in positionsiand j, withi< j, and the rest 0s; the coset leader is then-tuple and the associated syndrome is the sum of columnsiand j ofH. Continue in this manner through the cosets of weightt. We could choose to stop here. If we do, we can decode any received vector witht or fewer errors, but if the received vector has more thant errors, it will be either incorrectly decoded (if the syndrome of the received vector is in the table) or not decoded at all (if the syndrome of the received vector is not in the table). If we decide to go on and compute syndromes of weightswgreater than t, we continue in the same fashion with the added feature that we must check for possible repetition of syndromes.

This repetition will occur if then-tuple of weightwis not a coset leader or it is a coset leader with the same syndrome as another leader of weightw, in which cases we move on to the nextn-tuple. We continue until we have 2ⁿ⁻^k syndromes. The table produced will allow us to perform nearest neighbor decoding.

Syndrome decoding is particularly simple for the binary Hamming codesHr with par-ameters [n=2^r−1,2^r−1−r,3]. We do not have to create the table for syndromes and corresponding coset leaders. This is because the coset leaders are unique and are the 2^r vectors of weight at most 1. Let Hr be the parity check matrix whose columns are the binary numerals for the numbers 1,2, . . . ,2^r−1. Since the syndrome of the binaryn-tuple of weight 1 whose unique 1 is in positioniis ther-tuple representing the binary numeral for i, the syndrome immediately gives the coset leader and no table is required for syndrome decoding. ThusSyndrome Decoding for Binary Hamming Codestakes the form:

I. After receiving a vectory, compute its syndromesusing the parity check matrixHr. II. Ifs=0, thenyis in the code andyis decoded asy; otherwise,sis the binary numeral

for some positive integeriandyis decoded as the codeword obtained fromyby adding 1 to itsith bit.

The above procedure is easily modiﬁed for Hamming codes over other ﬁelds. This is explored in the exercises.

Exercise 67 Construct the parity check matrix of the binary Hamming codeH4of length 15 where the columns are the binary numbers 1, 2,. . ., 15 in that order. Using this parity check matrix decode the following vectors, and then check that your decoded vectors are actually codewords.

(a) 001000001100100, (b) 101001110101100,

Exercise 68 Construct a table of all syndromes of the ternary tetracode of Example 1.3.3 using the generator matrix of that example to construct the parity check matrix. Find a coset leader for each of the syndromes. Use your parity check matrix to decode the following vectors, and then check that your decoded vectors are actually codewords.

(a) (1,1,1,1), (b) (1,−1,0,−1),

Exercise 69 LetCbe the [6,3,3] binary code with generator matrixGand parity check

(a) Construct a table of coset leaders and associated syndromes for the eight cosets ofC.

(b) One of the cosets in part (a) has weight 2. This coset has three coset leaders. Which coset is it and what are its coset leaders?

(i) 110110, (ii) 110111, (iii) 110001.

(d) For one of the received vectors in part (c) there is ambiguity as to what codeword it should be decoded to. List the other nearest neighbors possible for this received

vector.

Exercise 70 LetH3be the extended Hamming code with parity check matrix

H3= the coordinate numbers in binary. We can decodeH3without a table of syndromes and coset leaders using the following algorithm. Ifyis received, compute syn(y) using the parity check matrixH₃. If syn(y)=(0,0,0,0)^T, thenyhas no errors. If syn(y)=(1,a,b,c)^T, then there is a single error in the coordinate positionabc(written in binary). If syn(y)=(0,a,b,c)^T with (a,b,c)=(0,0,0), then there are two errors in coordinate position 0 and in the coordinate positionabc(written in binary).

(a) Decode the following vectors using this algorithm:

(i) 10110101, (ii) 11010010, (iii) 10011100.

(b) Verify that this procedure provides a nearest neighbor decoding algorithm forH3. To do this, the following must be veriﬁed. All weight 0 and weight 1 errors can be corrected, accounting for nine of the 16 syndromes. All weight 2 errors cannot necessarily be corrected but all weight 2 errors lead to one of the seven syndromes remaining.

A received vector may contain botherrors(where a transmitted symbol is read as a different symbol) and erasures (where a transmitted symbol is unreadable). These are fundamentally different in that the locations of errors are unknown, whereas the locations of erasures are known. Supposec∈Cis sent, and the received vectorycontainsνerrors anderasures. One could certainly not guarantee thatycan be corrected if≥d because there may be a codeword other thanccloser toy. So assume that <d. PunctureCin the

45 1.11 Encoding,decoding,and Shannon’s Theorem

positions where the erasures occurred inyto obtain an [n−,k^∗,d^∗] codeC^∗. Note that k^∗=kby Theorem 1.5.7(ii), andd^∗≥d−. Puncturecandysimilarly to obtainc^∗andy^∗; these can be viewed as sent and received vectors using the codeC^∗withy^∗containingνerrors but no erasures. If 2ν <d−≤d^∗,c^∗can be recovered fromy^∗by Corollary 1.11.3. There is a unique codewordc∈Cwhich when punctured producesc^∗; otherwise if puncturing bothcandcyieldsc^∗, then wt(c−c)≤ <d, a contradiction unlessc=c. The following theorem summarizes this discussion and extends Corollary 1.11.3.

Theorem 1.11.6 LetCbe an[n,k,d]code. If a codewordcis sent andyis received where ν errors anderasures have occurred,thencis the unique codeword inC closest to y provided2ν+ <d.

Exercise 71 LetH3be the extended Hamming code with parity check matrix

H3=







1 1 1 1 1 1 1 1

0 0 0 0 1 1 1 1

0 0 1 1 0 0 1 1

0 1 0 1 0 1 0 1





.

Correct the received vector 101 0111, where is an erasure.

In Exercises 70 and 71 we explored the decoding of the [8,4,4] extended Hamming code H3. In Exercise 70, we had the reader verify that there are eight cosets of weight 1 and seven of weight 2. Each of these cosets is a nonlinear code and so it is appropriate to discuss the weight distribution of these cosets and to tabulate the results. In general, thecomplete coset weight distributionof a linear code is the weight distribution of each coset of the code. The next example gives the complete coset weight distribution ofH3. As every [8,4,4] code is equivalent toH3, by Exercise 56, this is the complete coset weight distribution of any [8,4,4] binary code.

Example 1.11.7 The complete coset weight distribution of the [8,4,4] extended binary Hamming codeH3is given in the following table:

Number of vectors

Coset of given weight Number

weight 0 1 2 3 4 5 6 7 8 of cosets

0 1 0 0 0 14 0 0 0 1 1

1 0 1 0 7 0 7 0 1 0 8

2 0 0 4 0 8 0 4 0 0 7

Note that the ﬁrst line is the weight distribution of H3. The second line is the weight distribution of each coset of weight one. This code has the special property that all cosets of a given weight have the same weight distribution. This is not the case for codes in general.

In Exercise 73 we ask the reader to verify some of the information in the table. Notice that this code has the all-one vector1and hence the table is symmetric about the middle weight.

Notice also that an even weight coset has only even weight vectors, and an odd weight coset has only odd weight vectors. These observations hold in general; see Exercise 72.

The information in this table helps explain the decoding ofH3. We see that all the cosets

of weight 2 have four coset leaders. This implies that when we decode a received vector in which two errors had been made, we actually have four equally likely codewords that could

have been sent.

Exercise 72 LetCbe a binary code of lengthn. Prove the following.

(a) IfCis an even code, then an even weight coset ofChas only even weight vectors, and an odd weight coset has only odd weight vectors.

(b) IfCcontains the all-one vector1, then in a ﬁxed coset, the number of vectors of weight iis the same as the number of vectors of weightn−i, for 0≤i≤n.

Exercise 73 Consider the complete coset weight distribution of H3 given in Example 1.11.7. The results of Exercise 72 will be useful.

(a) Prove that the weight distribution of the cosets of weight 1 are as claimed.

(b) (Harder) Prove that the weight distribution of the cosets of weight 2 are as

claimed.

We conclude this section with a discussion of Shannon’s Theorem in the framework of the decoding we have developed. Assume that the communication channel is a BSC with crossover probability on which syndrome decoding is used. The word error rate Perr

for this channel and decoding scheme is the probability that the decoder makes an error, averaged over all codewords of C; for simplicity we assume that each codeword ofC is equally likely to be sent. A decoder error occurs whenc=arg maxc∈Cprob(y|c) is not the originally transmitted wordcwhenyis received. The syndrome decoder makes a correct decision ify−cis a chosen coset leader. This probability is

^wt(y⁻^c)(1−)ⁿ^−wt(y⁻^c)

by (1.10). Therefore the probability that the syndrome decoder makes a correct decision is n

i=0αiⁱ(1−)ⁿ⁻ⁱ, whereαi is the number of cosets weighti. Thus Perr=1−

n i=0

αiⁱ(1−)ⁿ⁻ⁱ. (1.12)

Example 1.11.8 Suppose binary messages of lengthkare sent unencoded over a BSC with crossover probability. This in effect is the same as using the [k,k] codeF^k₂. This code has a unique coset, the code itself, and its leader is the zero codeword of weight 0. Hence (1.12) shows that the probability of decoder error is

Perr=1−⁰(1−)^k =1−(1−)^k.

This is precisely what we expect as the probability of no decoding error is the probability

(1−)^kthat thekbits are received without error.

Dans le document Fundamentals of Error-Correcting Codes (Page 58-67)