The LZW Algorithm - ZIV-LEMPEL CODING - Source Coding Algorithms

Source Coding Algorithms

2.5 ZIV-LEMPEL CODING

2.5.3 The LZW Algorithm

In the LZ78 encoding algorithm, inclusion of the explicit codeword C ( S ) of the symbol S along with the index i in the output

<

_i,C ( S )

>

is often very wasteful. In LZW algorithm, this inefficiency has been overcome by removing the inclusion of C ( S ) and transmitting the index i only. This is accomplished by initializing the dictionary with a list of single symbol patterns to include all the symbols of the source alphabet. In each step, the index of the longest match from the input in the dictionary is output and a new pattern is inserted in the dictionary. The new pattern is formed by concatenating the longest match with the next character in the input stream. As a result, the last symbol (or character) of this new pattern is encoded as the first character of the next one.

2.5.3.1 Example-LZW Encoding: The LZW encoding algorithm is ex- plained below with an example to encode the string b a b a c b a b a b a b c b.

The dictionary generation is shown in Table 2.11. Initially, the dictionary consists of single symbol (or character) patterns a , b, and c from the input alphabet { a , b, c}. The indexes of the patterns in the dictionary are 1, 2, and 3 respectively.

After receiving the first character b, the encoder finds the match at index 2 . But the pattern ba with first two characters does not have a match in

50 SOURCE CODlNG ALGORITHMS

the current dictionary. Hence the encoder outputs index 2 to encode the first character b and inserts the new pattern b a to index 4 in the dictionary.

The second input character a has a match in the dictionary to index 1, but a b formed by the second and third characters does not have a match. As a result the encoder outputs index 1 to encode a and inserts the new pattern a b in the dictionary to index 5.

Now the next two characters b a match with the pattern to index 4, but b a c does not have a match. Hence the encoder outputs index 4 to encode b a and inserts the new pattern b a c in the dictionary to index 6.

The next character c now matches with index 3, but c b does not. Hence

The next two characters a b now match with the pattern at index 5 in the dictionary, but a b c does not. Hence the encoder outputs index 5 to encode n b and inserts the new pattern a b c in the dictionary to index 10.

The next two characters c b have a match at index 7 in the dictionary.

Hence the encoder outputs the index 7 to encode c b and stops.

As a result the output of the encoder is 2 1 4 3 4 ⁸^{5 7.}

2.5.3.2 Example-LZW Decoding: In this example, we take the same encoder output from the previous example and decode using LZW decoding algorithm. The input to the decoder is 2 1 4 3 4 ⁸^{5 7.}

ZIV-LEMPEL CODING 51 Like the encoder, the decoder starts with the initial dictionary with three entries for a , b, c with indexes 1, 2, 3. After visiting the first index 2, the decoder outputs the corresponding pattern b from the dictionary.

The next output is a corresponding to the second input index 1. At this point, the decoder inserts a new pattern b a in the dictionary to index 4.

This new pattern b a is formed by concatenating the first character ( a ) of the current output pattern ( a ) at the end of the last output pattern (b).

The next input index is 4, which corresponds to the pattern ba in the dictionary. Hence the decoder outputs b a and inserts the new pattern a b in the dictionary to index 5 . The new pattern a b is again formed by concatenating the first character (b) of the current output pattern b a at the end of the last output pattern a .

The next input index is 3, which corresponds to c in the current dictionary.

The decoder hence outputs c and inserts a new pattern b a c in the dictionary to index 6. This pattern b a c has been formed by concatenating c at the end

The next input to the decoder is index 8. But the dictionary does not have any pattern with index 8. This tricky situation arises during decoding if a pattern has been encoded using the pattern immediately preceding it during the encoding process. As a result, the last character of the pattern is the same as the first character in this tricky situation. If we carefully examine the encoding steps in the previous example, we find that index 8 corresponds to the pattern b a b, which has been encoded using the immediately preceding pattern b a . And we get b a b by appending the first character b of b a with itself.

52 SOURCE CODING ALGORITHMS

Hence after the decoder receives the input index 8, it discovers that the index and a corresponding pattern does not exist in the current dictionary. It also recognizes that this tricky situation will happen because of the scenario that we discussed above. Hence the decoder creates the output by concatenating the first character of the previous output with the previous output itself. Since the previous output was b a , the decoder outputs the b a b in the current decoding step and inserts this new pattern in the dictionary to index 8. The following input index 5 corresponds to the pattern a b and hence the decoder outputs a b and inserts the new pattern b a b a in the dictionary to index 9. This pattern b a b a is formed by concatenating the first character a of the current output a b at the end of the previous output b a b.

The next input index 7 corresponds to the pattern c b. The decoder outputs c b and obviously inserts the new pattern a b c in the dictionary and stops. At this point the final dictionary is exactly identical to the final dictionary that was formed during the encoding process, as shown in Table 2.11 in the previous example.

2.6 SUMMARY

In this chapter, we presented some of the key source coding algorithms widely used in data and image compression. First we described the run-length coding scheme with an example. We described the popular Huffman coding scheme that is used in various image and data compression techniques. We discussed the modified Huffman coding scheme to enhance its efficiency. Arithmetic coding is an alternative approach for an efficient entropy encoding and it achieves compression efficiency very close to the entropy limit. We discussed the basic principles of arithmetic coding with an example and the implementation issues. We described the binary arithmetic coding with an example.

Binary arithmetic coding is a key algorithm for bilevel image compression.

Variations of adaptive implementation of binary arithmetic coding algorithm have been adopted in different image compression standards-JBIG, JBIG2, JPEG, JPEG2000. We discussed the QM-coder algorithm for implementation of an adaptive binary arithmetic coding, which has been adopted in the JBIG standard for bilevel image compression and also in a mode of JPEG standard. A variation of QM-coder called the MQ-coder is the basis of the entropy encoding of the new JPEG2000 standard for still picture compression.

We also described dictionary-based coding, especially the key algorithms in the popular Ziv-Lempel family of algorithms that are mainly used in text compression.

REFERENCES 53

REFERENCES

1. D. A. Huffman, “A Method for the Construction of Minimum-Redundancy Codes,” Proceedings of the IRE, Vol. 40, No. 9, pp. 1098-1101, 1952.

2. R. G. Gallagher, “Variations on a Theme by Huffman,” IEEE Transac- tions on Information Theory, IT-24, Vo1.6, pp. 668-674, November 1978.

3. R. Hunter and A. H. Robinson, “International Digital Facsimile Stan- dard,” Proceedings of IEEE, Vol. 68, No. 7, pp. 854-867, 1980.

4. I. H. Witten, R. M. Neal, and J. G. Cleary, “Arithmetic Coding for Data Compression,” Communications of the ACM, Vol. 30, No. 6, June 1987.

5. W. B. Pennebaker, J. L. Mitchell, G. G. Langdon, Jr., and R. B. Arps,

“An Overview of the Basic Principles of the Q-Coder Adaptive Binary Arithmetic Coder,” IBM ^Journalof Research and Development, Vol. 32, pp. 717-726, November 1988.

6. S. Lei, “Efficient Multiplication-Free Arithmetic Codes,” IEEE Transac- tions on Communications, Vol. 43, No. 12, December 1995.

7. B. Fu ^and^{K. K.}Parhi, “Generalized Multiplication-Free Arithmetic Codes,”

IEEE Transactions on Communications, Vol. 45, No. 5, May 1997.

8. W. B. Pennebaker and J. L. Mitchell, JPEG Stall Image Data Compression Standard. Chapman & Hall, New York, 1993.

9. ISO/IEC JTC 1/SC 29/WG 1 WGlN1878, “JPEG2000 Verification Model 8.5 (Technical Description),” September 13, 2000.

10. J. Ziv and A. Lempel, “A Universal Algorithm for Sequential Data Com- pression,” IEEE Trans. on Info. Theory, IT-23, Vol. 3, pp. 337-343, May, 1977.

11. J. Ziv and A. Lempel, ‘Compression of Individual Sequences Via Variable- rate Coding,” IEEE Trans. on Info. Theory, IT-24, Vol. 5, pp. 530-536, September 1978.

12. T. Welch, “A Technique for High-Performance Data Compression,’, IEEE Computer, Vol. 17, No. 6, 1984.

13. J. A. Storer and T. G. Syzmanski, “Data Compression via Textual Sub- stitution,” Journal of the ACM, Vol. 29, pp. 928-951, 1982.

Dans le document Standard for (Page 67-72)