Equipe CSN by Bertrand LE GAL

(1)

Quelques travaux récents…

Equipe CSN by Bertrand LE GAL

(bertrand.legal@ims-bordeaux.fr)

Laboratoire IMS- UMR CNRS 5218 Institut Polytechnique de Bordeaux

Université de Bordeaux 1 France

7 Février 2013

31.05.2010

^1/2

L"Université de Bordeaux publie une étude comparative des initiatives campus verts menées à l"échelon international, qui représente une source précieuse d"informations et de réflexions pour l"élaboration de son nouveau modèle d"Université dans le cadre de l"Opération campus. Elle fait le choix de diffuser en accès libre cette étude, le développement durable étant l"affaire de tous et pour l"intérêt de tous.

L*Université de Bordeaux s*est engagée à bâtir un nouveau modèle d*Université, et parallèlement à de- venir leader en matière de développement durable. C*est en ce sens que début 2009, elle a répondu favorablement, conjointement avec l*Université Bordeaux 1 Sciences Technologies, à la proposition d*Ecocampus-Nobatek et d*EDF : réaliser un retour d*expériences et des analyses sur des projets campus verts en France, Europe et Amérique du Nord.

L*objectif de cette étude (cf. page suivante) a été d*observer et de capturer les bonnes pratiques et ac- tions exemplaires relatives aux grands piliers du développement durable : domaines économiques, so- ciaux, environnementaux et organisationnels. L*Université de Bordeaux va s*y référer pour mettre en Wuvre une gouvernance et une stratégie à long terme au service d*un campus plus vivable et plus équi- table pour l*ensemble de la communauté universitaire.

Avec le Grenelle de l*environnement comme repère à atteindre puis à dépasser, l*Université de Bor- deaux entend constituer un site pilote à travers une démarche de développement durable globale par : - l*intégration permanente des dimensions humaines dans le projet immobilier et l*aménagement (acces- sibilité, santé, lisibilité, confort, cadre de vie) ;

- une transformation énergétique radicale des bâtiments dans le cadre de leur rénovation en démarche HQE® et un schéma directeur énergétique pour une réduction maximale des gaz à effet de serre ; - la mise en valeur et la sanctuarisation d*un parc sur le site universitaire de Talence-Pessac- Gradignan, véritable poumon vert à l*échelle de l*agglomération, atout exceptionnel pour la qualité de vie des usagers et le développement de la biodiversité en milieu urbain ;

- un plan de déplacement sur l*ensemble des domaines du campus universitaire, afin de réduire l*usage individuel de la voiture et son impact en s*appuyant sur des réseaux de transports en commun perfor- mants et le développement des modes alternatifs ;

- une ouverture concertée sur la ville, visant à favoriser le développement économique des territoires, celui de la vie de campus et à créer une mixité sociale et fonctionnelle ;

- et enfin, condition sine qua non de réussite, la mise en place d*un processus d*information et de concer- tation auprès de tous les membres et acteurs de l*Université, pour une compréhension partagée des en- jeux et un apprentissage des comportements responsables.

Aussi, l*Université de Bordeaux entend-elle élaborer un agenda 21 et faire de son campus un site d*expérimentation permettant de développer des approches innovantes à partir des compétences des laboratoires.

L*étude « Initiatives campus verts » est téléchargeable sur le site www.univ-bordeaux.fr

Contacts presse Université de Bordeaux

Anne SEYRAFIAN . Norbert LOUSTAUNAU . T 33 (0)5 56 33 80 84 . communication@univ-bordeaux.fr Contact Nobatek-Ecocampus

Julie CREPIN, chef de projet . T 33 (0)5 56 84 63 72 . jcrepin@nobatek.com

L*Université de Bordeaux

Vers un nouveau modèle d"Université DURABLE

(2)

Equipe CSN - Workshop du Groupe Conception

B. Le Gal 22 Novembre 2013

Sorry…

2

—- 16

(3)

Machine à laver… nombre d’itération nécessaire / SNR

3

JOURNAL OF L^ATEX CLASS FILES, VOL. 6, NO. 1, JANUARY 2007 10

3 5 7 9 11 13 15 17 19 21 23 25

0 100 200 300 400 500

# of layered decoding iterations

Airthroughput(Mbps)

1944 ⇥ 972 2048 ⇥ 384 4000 ⇥ 2000 4896 ⇥ 2448 8000 ⇥ 4000 9972 ⇥ 4986 20000 ⇥ 10000 64800 ⇥ 25600 64800 ⇥ 31400

Fig. 12. Throughput performance of LDPC decoders depending on the number of decoding iterations

3 5 7 9 11 13 15 17 19 21 23 25

0 200 400 600 800 1,000 1,200

# of layered decoding iterations

Decodinglatency(us)

1944 ⇥ 972 2048 ⇥ 384 4000 ⇥ 2000 4896 ⇥ 2448 8000 ⇥ 4000 9972 ⇥ 4986 20000 ⇥ 10000 64800 ⇥ 25600 64800 ⇥ 31400

Fig. 13. Decoding latency of LDPC decoders depending on the number of decoding iterations

Throughputs achieved by the x86 LDPC decoders vary from more that 300Mbps for 3 decoding iterations down to 60Mbps for 25 iterations.

Code structure do not impact on throughput performance contrary to GPU-based decoders. Indeed, in GPU related GPU works parameter Quasi-Cyclic structure of a code can hardly impact on throughput [24]. Proposed approach is insensitive to code structure parameters: the overall XXX offers very closed performances.

Code length (from 2k up to 8k) do not affect throughput.

However, this aknowledgment is wrong for longer frame lengths that decrease throughput. This performance reduction comes from the fact that decoder memory footprints are larger that processor cache.

Figure 13 provided the decoding latency of the x86 de- coders. Latency is very low and vary from 200us for 3 decoding iterations up to 1.5ms for 25 iterations on DVB-S2 long frame code. Latency increases linearly according to the number of decoding iterations. However, latency increase is highly impacted by frame length. In all cases, x86 decoder

2 4 6 8 10

0 200 400 600

Eb/N₀

Airthroughput(Mbps)

1944 ⇥ 972 2048 ⇥ 384 4000 ⇥ 2000 4896 ⇥ 2448 8000 ⇥ 4000 9972 ⇥ 4986 20000 ⇥ 10000 64800 ⇥ 25600 64800 ⇥ 31400

Fig. 14. BER performance of LDPC codes having: a code rate of 1/2, different LDPC matrix and different frame length

2 4 6 8 10

0 200 400 600 800 1,000

Eb/N₀

Decodinglatency(us)

1944 ⇥ 972 2048 ⇥ 384 4000 ⇥ 2000 4896 ⇥ 2448 8000 ⇥ 4000 9972 ⇥ 4986 20000 ⇥ 10000 64800 ⇥ 25600 64800 ⇥ 31400

Fig. 15. BER performance of LDPC codes having: a code rate of 1/2, different LDPC matrix and different frame length

provides short decoding latency (less than 1ms for frame length shorter or equal to 20k ).

Second evaluation focuses on the SNR value. This evalu- ation show the performance improvement introduced by the early termination criteria when SNR level increase. The de- coding process was configured to process at least 20 decoding iterations when no valid codeword is discovered. Figures 14 and 15 respectively provides the throughput and the decoding latency of the x86 decoder depending on the SNR value.

Figures 14 show that the...

This set of experimentation show that the x86 implementa- tion of the LDPC decoding process is efficient in terms of throughput and decoding latency. Selected LDPC decoding algorithm and its implementation offers performances (on a single processor core) that are higher than ones required in nowadays communication standards. Moreover, processing latency introduced by the decoding process is still low (< 1ms at 25 decoding iterations except for DVB-S2 long frames).

These performances enables the proposed solution to be used

in practical SDR (software-defined radio) systems.

(4)

B. Le Gal Equipe CSN - Présentation travaux de recherche 22 Novembre 2013

B. Le Gal Novembre 2012

Les SoCs numériques, une architecture conjointe

4

La majorité des SoCs intègrent des blocs numériques dédiés

+ processeurs

Il en va de même pour les circuits FPGA (Xilinx & Altera)

Regrouper au sein d’une même puce => performances & fiabilité

(5)

Le meilleur example de ces évolutions, votre téléphone !

5

SoC A7 - iPhone 5S SoC A6 - iPhone 4S

Tech. 32 nm, dimension 9,7 mm x 9,97 mm

(6)

Ou bien vos tablettes connectées…

6

Sony Tablet S1 NVIDIA Tegra 2 (SoC)

(7)

Des blocs numériques de différentes granularités

7

Implantation de fonctions numériques simples

Implantation de fonctions numériques complexes ou

systèmes simples

Implantation de systèmes numériques complexes

(8)

Différentes manière de jouer aux « Lego »

8

Le processus d’intégration est

contraint par la bibliothèque de portes logiques fournie par le fondeur.

+

Algorithme

Bibliothèque technologique

(9)

Différentes manière de jouer aux « Lego »

9

(10)

Les architectures programmables

10

(11)

B. Le Gal Equipe CSN - Présentation PSA OpenLab 22 Novembre 2013

B. Le Gal 7 Février 2013

Les décodeurs LDPC logiciels

11

Les objectifs de simulation de codes LDPC

Simuler pour évaluer et/ou valider les performances (BER) d’un code LDPC

Algorihtm Collapsed read Uncollapsed read Collapsed write Uncollapsed write

576x288 (F) m m m m

576x288 (L) m m m m

1024x512 (F) m m m m

1024x512 (L) m m m m

4000x2000 (F) m m m m

4000x2000 (L) m m m m

8000x4000 (F) m m m m

8000x4000 (L) m m m m

20000x10000 (F) m m m m

20000x10000 (L) m m m m

64880x32400 (F) m m m m

64880x32400 (L) m m m m

Fig. 7. Memory access requires by the LDPC decoder

0.5 1.5 2.5

10 ⁹ 10 ⁷ 10 ⁵ 10 ³ 10 ¹

Eb/N0

BER

816x408 1944x972 2304x1152 4000x2000 4896x2448 8000x4000 20000x10000 64800x32400

Fig. 8. BER performance of LDPC codes having: a code rate of1/2, different LDPC matrix and different frame length

is mainly due to the fact that: from an architectural point of view, then cores shared the same memory caches and from a software point of view, synchronisation barriers are required to synchronize the processing thread (e.g. to exchange data).

Efficiency implementing an application onto such kind of architecture that provides SM and SIMD parallelism at the same time is complex. In this section we detail the parallelism levels available in LDPC decoding algorithm, and then explain the approach used to achieve high throughput performances.

Implementing an efficient LDPC decoder on such architectures is a challenging task. Indeed, LDPC decoding algorithm has characteristic that made it inefficient: it requires large memory set, data reusing is quite low between consecutive computations, the memory access ratio over the computation complexity is low and a large set of memory access are irregulars.

LDPC decoding process is memory consuming and necessitates collapsed and uncollapsed read/write accesses to the memory. The main optimization stage consists in identifying an efficient way to reduce memory footprint and reduces memory accesses. Constraint on memory optimization differs from previous GPU works because decoding computations are

0.5 1.5 2.5

10 ⁸ 10 ⁶ 10 ⁴ 10 ²

Eb/N0

BER

2it.

4it.

6it.

8it.

10it.

12it.

14it.

16it.

18it.

20it.

30it.

40it.

50it.

100it.

Fig. 9. BER performance of the 8000⇥4000 LDPC code depending on the number of iterations

sequential.

However, such functionality is used to boost LDPC decoding performance. Technique used ans associated cost is presented bellow.

Finality, performance is improved taking advantage of LDPC code characteristics (degree of the VN and CN nodes) to reduce memory accesses, minimize control instructions and improve data locality in registers.

These optimization techniques are presented in the follow- ing subsections.

A. Memory optimization

LDPC decoder consume memory to store VN values and messages from VN to CN and vice-et-versa. Moreover, LDPC decoder needs to storeHmatrix information to realize computations over the right data set. Amount of memory depends on LDPC code and computation scheduling (flooding/layered).

1) Flooding algorithm: The flooding based algorithm is composed of two main stages. The first one computes the messages from Check-Nodes to Variable Node and the second stage performs the inverse processing. This algorithm required at least two memory array to store message from/to VN

Algorihtm Collapsed read Uncollapsed read Collapsed write Uncollapsed write

576x288 (F) m m m m

576x288 (L) m m m m

1024x512 (F) m m m m

1024x512 (L) m m m m

4000x2000 (F) m m m m

4000x2000 (L) m m m m

8000x4000 (F) m m m m

8000x4000 (L) m m m m

20000x10000 (F) m m m m

20000x10000 (L) m m m m

64880x32400 (F) m m m m

64880x32400 (L) m m m m

Fig. 7. Memory access requires by the LDPC decoder

0.5 1.5 2.5

10 ⁹ 10 ⁷ 10 ⁵ 10 ³ 10 ¹

Eb/N0

BER

816x408 1944x972 2304x1152 4000x2000 4896x2448 8000x4000 20000x10000 64800x32400

Fig. 8. BER performance of LDPC codes having: a code rate of1/2, different LDPC matrix and different frame length

is mainly due to the fact that: from an architectural point of view, then cores shared the same memory caches and from a software point of view, synchronisation barriers are required to synchronize the processing thread (e.g. to exchange data).

Efficiency implementing an application onto such kind of architecture that provides SM and SIMD parallelism at the same time is complex. In this section we detail the parallelism levels available in LDPC decoding algorithm, and then explain the approach used to achieve high throughput performances.

Implementing an efficient LDPC decoder on such architectures is a challenging task. Indeed, LDPC decoding algorithm has characteristic that made it inefficient: it requires large memory set, data reusing is quite low between consecutive computations, the memory access ratio over the computation complexity is low and a large set of memory access are irregulars.

LDPC decoding process is memory consuming and necessitates collapsed and uncollapsed read/write accesses to the memory. The main optimization stage consists in identifying an efficient way to reduce memory footprint and reduces memory accesses. Constraint on memory optimization differs from previous GPU works because decoding computations are

0.5 1.5 2.5

10 ⁸ 10 ⁶ 10 ⁴ 10 ²

Eb/N0

BER

2it.

4it.

6it.

8it.

10it.

12it.

14it.

16it.

18it.

20it.

30it.

40it.

50it.

100it.

Fig. 9. BER performance of the 8000⇥4000 LDPC code depending on the number of iterations

sequential.

However, such functionality is used to boost LDPC decoding performance. Technique used ans associated cost is presented bellow.

Finality, performance is improved taking advantage of LDPC code characteristics (degree of the VN and CN nodes) to reduce memory accesses, minimize control instructions and improve data locality in registers.

These optimization techniques are presented in the follow- ing subsections.

A. Memory optimization

LDPC decoder consume memory to store VN values and messages from VN to CN and vice-et-versa. Moreover, LDPC decoder needs to storeHmatrix information to realize computations over the right data set. Amount of memory depends on LDPC code and computation scheduling (flooding/layered).

1) Flooding algorithm: The flooding based algorithm is composed of two main stages. The first one computes the messages from Check-Nodes to Variable Node and the second stage performs the inverse processing. This algorithm required at least two memory array to store message from/to VN

Simuler pour évaluer l’impact de la réduction du nombre d’itérations sur les performances (BER) => impact sur la complexité calculatoire

(12)

B. Le Gal Equipe CSN - Présentation PSA OpenLab 22 Novembre 2013

B. Le Gal 7 Février 2013

Les décodeurs LDPC logiciels

12

Les objectifs de simulation de codes LDPC

1 2

10 ⁸ 10 ⁶ 10 ⁴ 10 ²

Eb/N0

BER

= 0.00

= 0.05

= 0.10

= 0.15

= 0.20

= 0.25

↵= 0.50

↵= 0.70

↵= 0.75

↵= 0.80

↵= 0.85

↵= 0.90

↵= 0.95

Fig. 10. BER performance of the8000⇥4000LDPC running25iterations with different BP replacement heuristics

1 2

10 ⁸ 10 ⁶ 10 ⁴ 10 ²

Eb/N0

BER

floating point ref.

QX.2(5/6/4) QX.3(5/6/4) QX.4(5/6/4) QX.2(5/7/5) QX.3(5/7/5) QX.4(5/7/5) QX.2(6/8/6) QX.3(6/8/6) QX.4(6/8/6)

Fig. 11. BER performance of the 8000⇥4000 LDPC code running 20 iterations with different data word-length

from/to CN. Array sizes are specified by the number of messages to exchange namedz. Moreover, to execute decoding computation, decoder must know the degree of each CN and VN to compute. Two arrays whose dimensions equal n and m respectively are required.

Degree of node information is only required to execute the VN kernel. However, to execute the CN one, data interleaving is required. Indeed, each CN must read the right VN messages and must generate the message to the right VN. A data array of sizezis required to store the interleaving rules (interleaving from VN to CN and from CN to VN are identical).

Word-length of array elements depends on values to store.

For most of LDPC codes, one byte is required to store the CN and VN degree, two bytes are required to interleaving rules and n-bytes are used message values².

2) Layered algorithm: The layered algorithm is composed of a single processing loop that sequentially:

2In case of floating point decoder, 4-bytes are used. When fixed-point format is preferred,1-byte is enough.

1) computes a CN value using its linked VN ones and the latest CN to VN messages.

2) updates the linked VN values and the CN to VN messages from computed CN value.

This computation scheduling removes the requirement to store the VN to CN messages, reducing the memory footprint.

The overall data array are however required.

B. Using SIMD capability

SIMD capability provided in x86 architectures enables to perform identical computations of multiple data sets. There exist many parallelism levels that can be exploited on LDPC decoding:

• First parallelism level is located at inside node computation: each node (variable node or check node) requires similar computations to compute its value from incoming messages and update outgoing messages. However, the node degree (number of messages) is not often a power of two and is mainly irregular. These LDPC code characteristics discard SIMD optimization at this computation level.

• Second parallelism level is located at node computations:

each node (variable node or chack node) can be computed independently to others. Using SIMD approach it becomes possible to speed-up node computations. This approach was mainly used in GPU related work to speed- up decoding process. However, to be it necessitates that node degrees are identical (wrong for irregular codes) and moreover due to data dependencies in layered based decoder it is hardly usable.

• Third parallelism level is located at codeword computation: different codewords can be decoded simultaneously.

Indeed, the overall codewords are decoded using the same computation sequence over different data sets. The SIMD paradigm enables parallel processing, but increases decoding latency. Latency is not an issue in simulation context.

Depending on the SIMD instruction set available on the x86 architecture, its become possible to process:

• 4⇥ codeword decodings in parallel (in 32b floating point data format) for x86 SSE architecture and up to 8⇥ codeword decodings for x86 AVX architecture.

• 8⇥ codeword decodings in parallel (in 8-bit fixed point- format ³) for x86 SSE architecture and up to 16⇥ codeword decodings for x86 AVX-2 architecture.

However, using SIMD processing is not cost free: consecutive frames received by the LDPC decoder must be first interleaved to align data from the different frames in the memory before decoding and secondly deinterleaved at computation end.

C. LDPC decoder specialization

LDPC decoder performances increase whereas (a) the number of instructions executed decrease (b) the temporary data

3Data are store in memory in 8-bit format, however, processing requiers

1 2

10 ⁸ 10 ⁶ 10 ⁴ 10 ²

Eb/N0

BER

= 0.00

= 0.05

= 0.10

= 0.15

= 0.20

= 0.25

↵= 0.50

↵= 0.70

↵= 0.75

↵= 0.80

↵= 0.85

↵= 0.90

↵= 0.95

Fig. 10. BER performance of the8000⇥4000LDPC running25iterations with different BP replacement heuristics

1 2

10 ⁸ 10 ⁶ 10 ⁴ 10 ²

Eb/N0

BER

floating point ref.

QX.2(5/6/4) QX.3(5/6/4) QX.4(5/6/4) QX.2(5/7/5) QX.3(5/7/5) QX.4(5/7/5) QX.2(6/8/6) QX.3(6/8/6) QX.4(6/8/6)

Fig. 11. BER performance of the 8000⇥4000 LDPC code running 20 iterations with different data word-length

from/to CN. Array sizes are specified by the number of messages to exchange namedz. Moreover, to execute decoding computation, decoder must know the degree of each CN and VN to compute. Two arrays whose dimensions equal n and m respectively are required.

Degree of node information is only required to execute the VN kernel. However, to execute the CN one, data interleaving is required. Indeed, each CN must read the right VN messages and must generate the message to the right VN. A data array of sizez is required to store the interleaving rules (interleaving from VN to CN and from CN to VN are identical).

Word-length of array elements depends on values to store.

For most of LDPC codes, one byte is required to store the CN and VN degree, two bytes are required to interleaving rules and n-bytes are used message values².

2) Layered algorithm: The layered algorithm is composed of a single processing loop that sequentially:

2In case of floating point decoder, 4-bytes are used. When fixed-point format is preferred,1-byte is enough.

1) computes a CN value using its linked VN ones and the latest CN to VN messages.

2) updates the linked VN values and the CN to VN messages from computed CN value.

This computation scheduling removes the requirement to store the VN to CN messages, reducing the memory footprint.

The overall data array are however required.

B. Using SIMD capability

SIMD capability provided in x86 architectures enables to perform identical computations of multiple data sets. There exist many parallelism levels that can be exploited on LDPC decoding:

• First parallelism level is located at inside node computation: each node (variable node or check node) requires similar computations to compute its value from incoming messages and update outgoing messages. However, the node degree (number of messages) is not often a power of two and is mainly irregular. These LDPC code characteristics discard SIMD optimization at this computation level.

• Second parallelism level is located at node computations:

each node (variable node or chack node) can be computed independently to others. Using SIMD approach it becomes possible to speed-up node computations. This approach was mainly used in GPU related work to speed- up decoding process. However, to be it necessitates that node degrees are identical (wrong for irregular codes) and moreover due to data dependencies in layered based decoder it is hardly usable.

• Third parallelism level is located at codeword computation: different codewords can be decoded simultaneously.

Indeed, the overall codewords are decoded using the same computation sequence over different data sets. The SIMD paradigm enables parallel processing, but increases decoding latency. Latency is not an issue in simulation context.

Depending on the SIMD instruction set available on the x86 architecture, its become possible to process:

• 4⇥codeword decodings in parallel (in 32b floating point data format) for x86 SSE architecture and up to 8⇥ codeword decodings for x86 AVX architecture.

• 8⇥ codeword decodings in parallel (in 8-bit fixed point- format ³) for x86 SSE architecture and up to 16⇥ codeword decodings for x86 AVX-2 architecture.

However, using SIMD processing is not cost free: consecutive frames received by the LDPC decoder must be first interleaved to align data from the different frames in the memory before decoding and secondly deinterleaved at computation end.

C. LDPC decoder specialization

LDPC decoder performances increase whereas (a) the number of instructions executed decrease (b) the temporary data

3Data are store in memory in 8-bit format, however, processing requiers

Evaluer les performances d’une heuristique de décodage (et sélectionner le paramètre dont la

valeur est la plus efficace)

Choisir le format de donnée (virgule fixe) fournissant la plus faible complexité matérielle

et les meilleurs performances (BER)

(13)

Etude des performances pré-implantation

๏ Estimer fidèlement les performances avant de développer le circuit,

➡ Simulation de systèmes numériques  sur ordinateur / stations,

➡ Modèles réalistes du comportement,

‣ Comparaison d’algorithmes,

‣ Comparaison de codes,

‣ Evaluation des optimisations,

➡ Etudes chronophages 

(> semaines de simulation),

๏ Modèle de départ < 100kbps,

13

(14)

Application au codes LDPC (cible x86), différents codes

14

0 30 60 90 120

200x100 576x288 1200x6001944x9722048x3842304x11524000x20004896x24488000x40009972x498616200x756020000x1000064800x32400 GCC 4.6

Etat de l’art

Premiers travaux réalisés sur des ordinateurs de bureau.

Gain sur les temps de simulation avoisinant un facteur 100.

« High throughput LDPC decoding on GPU and CPU targets ». 

IEEE Transactions / Springer, January 2014.

(15)

Application au codes LDPC (cible GPU), différents #iters

15

0 500 1000 1500

3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 816x408

1024x512 1944x972 2304 x1152 4000 x2000 4896 x2448 8000 x4000 9972 x4986 20kx10k 64kx21k 64kx32k

Etat de l’art

Etude réalisée sur des cibles de type GPU.

Gain d’un facteur (4 =>16)  vis a vis de l’état de l’art.

Simulation 7 fois plus rapide que le circuit réel…

« A high throughput efficient approach for decoding LDPC codes onto GPU devices ». IEEE Embedded System Letter, November 2014.

(16)

Application au Codes Polaires, différentes tailles de code

16

0 450 900 1350 1800

256 512 1k 2k 4k 8k 16k 32k 64k 128k 256k 512k

s = 0.7 & r = 0.2 s = 0.7 & r = 0.5 s = 0.7 & r = 0.9

Etat de l’art

Application de l’approché a d’autres familles de FEC.

Débits atteints supérieurs au Gbps => Intérêt de concevoir des

circuits dédié ?

« More than 1Gbps throughput Polar Codes decoders on CPU target ». 

IEEE Transactions / Springer, January-march 2014.

(17)

La SDR, les premiers SoC numérique pour 2014 ?

17

La plateforme Tegra 4 pour les téléphones portables & Tablettes (2014).

La SDR en application !

3

TABLE III

POLAR CODE DECODER PERFORMANCES(T/P:THROUGHPUT INMBPS, L:

LATENCY IN US, M:MEMORY FOOTPRINT IN KBYTES).

Rate 2⁹ 2¹⁰ 2¹¹ 2¹² 2¹³ 2¹⁴ 2¹⁵ T/P 0.20 107 94.9 75.8 66.7 58.4 53.3 44.2 T/P 0.50 72.9 70.5 62.6 56.8 51.6 47.1 39.3 T/P 0.75 67.7 62.3 55.4 50.3 50.3 42.4 35.8 T/P 0.90 65.4 58.6 51.7 46.9 46.9 38.8 33.1

L 0.20 76 172 432 981 2244 4914 11864

L 0.50 112 232 522 1153 2540 5564 13344

L 0.75 121 262 591 1300 2601 6173 14624

L 0.90 125 279 633 1396 2793 6740 15844

M 18.12 36.25 72.5 145 290 580 1160

III. CONCLUSION

Not yet...

(18)

Architectures dédiées (ASIC, FPGA)

18