Block turbo codes : towards implementation

(1)

HAL Id: hal-02465657

https://hal.archives-ouvertes.fr/hal-02465657

Submitted on 4 Feb 2020

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Block turbo codes : towards implementation

Sylvie Kerouedan, Patrick Adde

To cite this version:

Sylvie Kerouedan, Patrick Adde. Block turbo codes : towards implementation. ICECS 2001 : 8th

IEEE International Conference on Electronics, Circuits and Systems, Sep 2001, Malta, Malta. pp.1219

- 1222, �10.1109/ICECS.2001.957434�. �hal-02465657�

(2)

Block Turbo Codes: towards implementation.

Sylvie Kerouédan*, Patrick Adde**

ENST Bretagne, BP 832, 29285 Brest Cedex, France * Sylvie.kerouedan@enst-bretagne.fr

** Patrick.adde@enst-bretagne.fr

Abstract

This paper presents two implementations of the same block turbo decoding algorithm: on the one band an elementary decoder in association with a sequencer performs the complete turbo decoding process, and on the other band, the circuit contains one elementary decoder per half-iteration. The choice of different parameters for each algorithm implemented bring the results more or less close to the theoretical limit. First, we briefly describe the iterative process which creates the ''turbo" effect and explain the essential choices in order to adapt the algorithm to an ASIC implementation. Tuen we describe the two prototypes.

Keywords

Turbo code, block turbo codes, iterative decoding, turbo decoder, SISO algorithm, product code.

1. INTRODUCTION

The concept of turbo codes, first introduced by C. Berrou [1], involves the iterative decoding oftwo concatenated convolutional codes or block codes -as shown by R. Pyndiah [2]. The iterative process, based on a SISO (soft input-soft output) algorithm exhibits performance close to the Shannon limit. We present here the implementation of BTC {block turbo codes) which are especially attractive for applications requiring high code rates or small blocks of data bits.

II. BLOCK TURBO CODE.

In 1972, Chase proposed an algorithm [4] which approximates the optimum sequence decoding of block codes with a low computation complexity and a small performance degradation. This algorithm yields the maximum likelihood sequence D for a

given soft input R. In 1994, R. Pyndiah [2] supplemented this algorithm to compute the soft decisions associated with the maximum likelihood sequence D, which gives a measure of the reliability of each component of D. This reliability function is represented by the Log Likelihood Ratio {LLR) of the decision d_jUth _{element ofD). The iterative turbo}

decoding process can be achieved by cascading several elementary decoders illustrated in Fig.1, where k represents the current half-iteration,

1. a vector is a row or a column of a product code depending on the current half-iteration,

2. R is the received vector,

3. W(k) is the vector which contains the extrinsic information (which is the difference between the output information and the input information) given by the previous decoder conceming the reliability of the decoded bit, 4. R'(k)=R+a.(k).W(k-1),

5. rx(k) and l}(k) (13 bas a role in reliability computation) are constants determined by simulations.

LRI

a{� ��

Delayliœ (R)

Fig. 1 : Block diagram of an elementary decoder The "optimal" turbo decoding algorithm of linear block codes follows the next steps:

1. Search for the m least reliable binary symbols of R'; their positions are called 11, li, ... I,,,, and their reliabilities are called MF1,MF2, .. ,,MFm,

2. Generate 't test sequences TQ which are a

(3)

"1" in position � and "O" elsewhere. 3. For each test word TQ_{, compute Z}Q_:

ZQ_=TQ_{EB sign of (R'),}

4. Decode ZQ _{by the Berlekamp algorithm (result}

c'),

5. For each vector CQ_{, compute the square}

Euclidean distance McQ _{between Rand C}Q_{, and class}

them from the smallest metric to the longest one. 6. The code word Cd _{having the minimal distance}

(Mo) with R' is the result ofbinary decoding D=Cd.

7. Compute reliability Fj for each element

4

ofD,

this involves searching for a code word ( called concurrent code word), where Ctt-C/ and which has a minimal square Euclidean distance with Cd _{. If}

there is no concurrent the reliability is fixed to the current j3.

8. Compute extrinsic information Wrdj.Frr\

fil. PERFORMANCE AND COMPLEXITY. It is not possible to implement such a complex algorithm without any simplifications. lndeed the increasing test sequences obtains better results but dramatically increase the storage capacity required for the whole process, which is why the algorithm was simplified by P. Adde before its implementation on programmable circuits (FPGA Xilinx) [6]. The most significant parameters which act on the complexity of the circuit are the number of iterations, the number of concurrent words, the number of test sequences and the number of quantization bits for data. In [5], [6] and [7] we compare the performances of the algorithm with different parameters, which we can sum up in a few words here:

+ Processing with 3 concurrent words rather than with 1 concurrent word increases the area by 13.5% for a gain ofO.ldB. We evaluate that the surface increases by 60% with the "16 concurrent words" solution for a gain of only 0.2dB.

+ If the number of quantization bits is equal to or less than 3, simulations show a significant degradation. The gain between the use of 5 bits instead of 4 bits is in the range of O.ldB for an increase of the area by about 15%.

+ We notice that the "16 test vectors" solution instead of the "8 test vectors" solution,

increases the area by 4% for a gain of about 0.07dB. The "32 test vectors" solution increases the area by 12% with respect to the "16 test vectors" solution for a gain of only 0.04dB. For our implementations, we choose fix the different parameters as described in TAB.l, we find the value of the different parameters. These parameters modify the complexity and the performance of the elementary decoder shown on Fig.1.

T AB.1: Different parameters for the two prototypes which implement the decoding of the product code

BCH(32,26,4)x BCH(32,26,4). PrototvneA PrototvneB Quantization bits 4 5 Numberof 4 7.5 iterations Number of test 8 16 sequences Numberof l 3 concurrent words ex Programmable parameter

depending on the current half-iteration

� Programmable Adaptive

parameter coefficient depending on the depending on

current half- the received

iteration wordf81

The iterative process is achieved by the rebuilding of the matrix after each elementary decoding ( or half-iteration) as described in Fig.2.

input

decoding ofrows .. -� (half.ireration)

Fig.2:Decoding structure

A first analysis of the block turbo decoder algorithm shows that there are two different structures which can be adopted for hardware implementation [4][9].

(4)

The first solution, prototype A developed in 1997 [10], is a modular structure where a module integrates the elementary decoder illustrated in Fig. l . Several modules are then pipelined to realize the block turbo decoder. In the second solution, prototype B developed in 2000 [Il], several iterations are realized by a single circuit, the core of this solution being the unique elementary decoder.

3.1 Elementary decoder architecture

Following the steps of the algorithm described above, we consider the elementary decoder, described in Fig. l, using the architecture illustrated in Fig.3. We distinguish five parts in the circuif working in parallel thanks to a sequencer:

+ the sequential input part, which concems ail the functions where progress calculations are performed at the rate of the input symbols (a counter is used for the timing of the elementary decoder, the parity and the syndrome computing sub-blocks, and the processing unit that deterrnines the rn least reliable binary symbols); + the algebraic decoding part, which determines

the optimum code word for a given input binary vector;

+ the selection part, which selects the maximum likelihood code word (that is, the word having the minimal distance from R'k) and the concurrent code word when it exists;

+ the output sequential part, which computes the extrinsic information where calculations are performed at the same the rate as the output symbols;

+ 2 storage elements which are divided into 3 RAMs in order to compute parallel calculation.

Input

sequential >---

part

Fig.3: Block diagram of the elementary decoder

3.2 Prototype A

The elementary decoder, described above, was

implemented in a FPGA circuit (Xilinx 4010), one per half-iteration. lt was validated by VHDL functional simulations, which were compared with initial simulations (C programs). The logic schemes were obtained from VHDL modeling, using automatic synthesis. The noisy data are generated by a C program, stored in a RAM of large size (4x256kbits) and selected randomly. Two breadboards were used:

1. a first one with the encoder, the Gaussian channel simulator and two decoders for the first iteration,

2. a second one, with six decoders for the three following iterations. This second circuit can be duplicated if necessary to increase the number of iterations.

The data rate is about 6Mbps and the experimental results are shown in Fig.6.

. Rec:epdoaaalt ' Emb1loaanlt

1 ... l:

�--o

t

Fig.5: Principle of the complete turbo decoding on a

single chip (Prototype B).

3.2 Prototype B

As illustrated in Fig.5, the architecture adopted to implement this solution consists of 3 blocks. The first one, the input unit, is used to store incoming data in the matrices. During the reception of one matrix, the previous matrix (when there is one) is sent to the processing unit which is in charge of the turbo decoding. In this second block, 7.5 iterations are performed. Tuen the decoded matrix is sent to the output unit, which returns the binary decision. The whole system was described in VHDL, simulated and compared with the results of C program simulations. The circuit is implemented in a FPGA Xilinx (technology VIR TEX, XCV200). It is important to notice that this implementation required 6 storage

(5)

elements, against only 2 for prototype A. The data rate is slightly less than 1 Mbps for 7 .5 iterations and the experimental results are shown in Fig.6. 1,00E-04 ,:,:: l,OOE-05 l,OOE-06 l,OOE-07 1,00E-08 2 2,5 3,5 4

Fig.6: Results of the two implementations compared to the theoretical limit calculated in [9].

IV. CONCLUSION

As we can notice in Fig.6, the results of the second prototype are better than for the first. Nevertheless, choosing prototype B as the best solution can be a mistake because the differences between the two solutions result in particular from the choice of parameters for the elementary decoder as discussed in section III (table 1 ). Tt is absolutely realistic to implement the more efficient elementary decoder in the modular solution. In this case, the experimental results will certainly be identical for both prototypes. Then the choice between the two implementations of block turbo decoders will depend on the usual compromise between area and speed.

Our solution is not the unique way to implement Block Turbo Decoder, we can notice the work of Dave et al [12] or S. Robert [13] who choose different algorithm or different product code. What we would demonstrate here is that the experimental results corroborate the simulation results and illustrate the real efficiency ofblock turbo decoding for future applications.

V. REFERENCES

[!] C. Berrou, A. Glavieux and P. Thitimajshima, "Near Shannon limit crror-correcting coding and decoding : Turbo-codes", IEEE !nt. Conf on Cam.

!CC' 93, May 1993, vol 2/3, pp. 1064-1071. [2] R. Pyndiah, A.Glavieux, A. Picart and S.Jacq,

"Near optimum decoding ofproduct codes", in proc. of IEEE GLOBECOM '94 Conference, Nov-Dcc. 1994, San Francisco, vol. 1/3, pp. 339-343 . [3] D. Chase, "A class of algorithms for dccoding

black codes with channel measurement information",

!TŒE Trans. Infomi. Theo,y, Jan. 1972, vol IT-18, pp. 170-182.

[4] P. Adde, R. Pyndiah, O. Raoul and J.R. lnisan, "Black turbo dccodcr design", !nt. Symposium on

turbo codes and related tapies, Brest, Sept. 1997, pp. 166-169

[5] P. Adde, R. Pyndiah, O. Raoul, "Performance and complexity of black turbo dccoder circuits", Third International Conference on Electronics, Circuits and System ICECS'96, 13-16 Oct. 1996 - Rodos, Greece, pp.172-175.

[6] S. Kerouédan, P. Adde, P. Ferry "Comparaison performances/complexité de décodeurs de codes BCH utilises en turbo-décodage". Gretsi'99, 13/17 Sept.1999.

[7] P. Adde, R. Pyndiah, F. Buda. "Desi_gn and performance of a product code turbo encoding dccoding prototype". Annal5 of telecommunications, vol.54, n°_{3-4, pp. 214-219.}

[8] O. Raoul, P. Adde and R. Pyndiah, "Architecture et conception d'un circuit turbo-décodeur de codes produits," GRETSJ'95, Juan-Les-Pins, Sept. 1995, pp 981-984.

[9] P. Adde, R. Pyndiah, "Recent simplification and improvement ofblock turbo code" 2nd International Symposium on Turbo Codes and Related Tapies, 317 Sept.2000, Brest.

[10] P. Adde, R. Pyndiah, J. R. lnisan et Y. Sichez, "Conception d'un turbo décodeur de code produit",

GRETST'97, Sept. 1997, Grenoble, pp. 1169-1172. [ 11] S. Kerouédan and P. Adde, "Implementation of a

block turbo decoder in a single chip", 2nd International Symposium on Turbo Codes and Related Tapies, 317 Sept.2000, Brest.

[12] S.Dave, , J. Kim and S. C. Kwatra, "An efficient decoding algorithm for Bloc Turbo Codes", IEEE Trans. On Communications, Jan.2001, vol. 49, no 1, pp 41-46.

[13] S. Robert, "Les turbo codes en blocs", séminaire