• Aucun résultat trouvé

Lossless data compression with error correcting codes

N/A
N/A
Protected

Academic year: 2022

Partager "Lossless data compression with error correcting codes"

Copied!
1
0
0

Texte intégral

(1)

ISlT 2003, Yokohama, Japan, June 29 -July 4,2003

Lossless Data Compression with Error Correcting Codes

Giuseppe Caire Shlomo Shamai Sergio V e r d ~

Institut Eurkcom Technion Princeton University,

Haifa 32000, Israel

06904 Sophia- Antipolis, France Princeton, NJ 08544 USA

giuseppe.caireQeurecom.fr ssh1omoQee.technion.ac.il verduQee.princeton.edu

Existing zero-error variable-length data compression algo- rithms suffer from sensitivity to transmission errors and error propagation across packets when used in packet-based data transmission through noisy channels. We propose a new ap- proach to lossless data compression based on error correcting codes and the block-sorting transform.

It has long been recognized that linear source codes achieve the entropy rate of memoryless sources and that the parity check matrix of an error correcting code coupled with its as- sociated syndrome decoder can be used as the linear com- pressor and decompressor, respectively. Such basic scheme is a constructive approach to Shannon’s almost-noiseless fixed- length data compression, which despite its theoretical impor- tance has had no impact in the practical world. Indeed, the reported embodiments of this approach with actual error cor- recting codes have not been able to compete favorably with existing data compression algorithms such as LZ. Not only the block error rates have been disappointingly high but, more importantly, the scope of this approach has been limited to memoryless sources (actually, biased coins for the most part), whose statistics are known to the decompressor.

In this paper, we propose a new approach that overcomes those shortcomings. The basic encoding scheme (Figure 1) takes a fixed-length block of data and passes it through the Burrows-Wheeler (Block sorting) transform. This is a one-to- one transformation that outputs the last column of the square matrix formed by lexicographic sorting of all the cyclic shifts of the original sequence. A key property of the BWT out- put [l] is that (as the blocklength grows) it, is asymptotically piecewise i.i.d. (for stationary ergodic tree sources) with the the length, location, and distribution of the i.i.d. segments depending on the statistics of the source. The segmentation block detects those segments whose entropy exceeds a cer- tain threshold and those data segments are passed uncom- pressed to the decompressor. The remaining segments are multiplied by the parity-check matrix of a channel code de- signed for a nonstationary binary symmetric channel (block labelled as ‘Syndrome’). In our preferred embodiment we use an irregular Low-Density Parity-Check (LDPC) matrix. The decompressor uses the iterative Belief Propagation (BP) al- gorithm modified to take into account that the parity-check equations now take the values of the compressed bits. The output is merged with the uncompressed segments and, fi- nally, the inverse Burrows-Wheeler Transform is taken. Each of the building blocks can be implemented in linear time.

Key to the favorable performance of our scheme is the fact that the encoder can run an exact copy of the iterative BP algorithm. The block labelled “Iterative Doping” is present at the compressor and communicates the actual values of the bits that achieve the lowest reliability at certain iterations of the BP algorithm. Performance is further improved by having a library of parity-check matrices and tuning the choice of the matrix to the source realization.

I I

Figure 1: Basic Compression/Decompression Scheme

Markm 451ales. H(X)=0.4935. n=lOODO

,I

I.

I.

tiVsymbd

We have implemented a universal version of the scheme us- ing an adaptive segmentation module that learns an approx- imation to the source tree in linear time via the Minimum- Description-Length principle. A coarse version of both the segmentation and the distributions within each segment is communicated to the decoder, which then refines itera- tively the statistical model in conjunction with the Belief- Propagation algorithm. Figure 2 shows the histogram of achieved compression with the universal algorithm for a four- state Markov chain in comparison with LZ (gzip) and a con- ventional BWT-based compressor (bzip). Although the com- parison is quite encouraging, the main motivation and advan- tages of our scheme over the conventional separation-based approach accrue in the joint source/channel setting, which will be reported elsewhere.

REFERENCES

[l] M. Effros, K. Visweswariah, S. Kulkarni, and S. Verdli. Data compression based on the Burrows-Wheeler transform: Analysis and optimality. IEEE Trans. on Information Theory, 48:1061- 1081, May 2002.

22

0-7803-7728-1103/$17.00 02003 IEEE.

Références

Documents relatifs

study hardware attacks (and in the context of the COST action, especially fault injection attacks) on cryptographic protocols based on error-correcting codes,.. modify

Fix some number of probes p, noise level δ, and allowed error probability ε, and consider the minimal length of p-probe error-correcting data structures for s-out-of-n Membership..

The code optimization using the search tree relies on bounds on the free distance of partially defined codebooks.. Definition 3: An EC-VLC is an incomplete codebook (IC) if

Whatever the coding domain (time or frequency), the coding scheme will differ from a classical channel coding by the fact that the coded signal must be in the same space

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des

We show that the improved FFH method employing Wang Landau algorithm based on a Markov Chain Monte Carlo (MCMC) sampler reduces the simulation time of the performance evaluation

We propose a quantum error correcting code which takes advantage of this asymmetry and shows good performance at a relatively small cost in redundancy, requiring less than a doubling

transpose the signal located around 57 kHz at the output of the FM demodulator to bring the digital information into baseband, for example using the GNURadio Xlating FIR Filter ,