• Aucun résultat trouvé

«Les circuits FPGA»

N/A
N/A
Protected

Academic year: 2022

Partager "«Les circuits FPGA»"

Copied!
119
0
0

Texte intégral

(1)

EN208: Technologie des circuits numériques

«Les circuits FPGA»

Bertrand LE GAL


{prenom.nom}@ims-bordeaux.fr  

!

Laboratoire IMS - UMR CNRS 5218 Institut Polytechnique de Bordeaux

Université de Bordeaux France

2013-2014

31.05.2010

L"Université de Bordeaux publie une étude comparative des initiatives campus verts menées à l"échelon international, qui représente une source précieuse d"informations et de réflexions pour l"élaboration de son nouveau modèle d"Université dans le cadre de l"Opération campus. Elle fait le choix de diffuser en accès libre cette étude, le développement durable étant l"affaire de tous et pour l"intérêt de tous.

L*Université de Bordeaux s*est engagée à bâtir un nouveau modèle d*Université, et parallèlement à de- venir leader en matière de développement durable. C*est en ce sens que début 2009, elle a répondu favorablement, conjointement avec l*Université Bordeaux 1 Sciences Technologies, à la proposition d*Ecocampus-Nobatek et d*EDF : réaliser un retour d*expériences et des analyses sur des projets campus verts en France, Europe et Amérique du Nord.

L*objectif de cette étude (cf. page suivante) a été d*observer et de capturer les bonnes pratiques et ac- tions exemplaires relatives aux grands piliers du développement durable : domaines économiques, so- ciaux, environnementaux et organisationnels. L*Université de Bordeaux va s*y référer pour mettre en Wuvre une gouvernance et une stratégie à long terme au service d*un campus plus vivable et plus équi- table pour l*ensemble de la communauté universitaire.

Avec le Grenelle de l*environnement comme repère à atteindre puis à dépasser, l*Université de Bor- deaux entend constituer un site pilote à travers une démarche de développement durable globale par : - l*intégration permanente des dimensions humaines dans le projet immobilier et l*aménagement (acces- sibilité, santé, lisibilité, confort, cadre de vie) ;

- une transformation énergétique radicale des bâtiments dans le cadre de leur rénovation en démarche HQE® et un schéma directeur énergétique pour une réduction maximale des gaz à effet de serre ; - la mise en valeur et la sanctuarisation d*un parc sur le site universitaire de Talence-Pessac- Gradignan, véritable poumon vert à l*échelle de l*agglomération, atout exceptionnel pour la qualité de vie des usagers et le développement de la biodiversité en milieu urbain ;

- un plan de déplacement sur l*ensemble des domaines du campus universitaire, afin de réduire l*usage individuel de la voiture et son impact en s*appuyant sur des réseaux de transports en commun perfor- mants et le développement des modes alternatifs ;

- une ouverture concertée sur la ville, visant à favoriser le développement économique des territoires, celui de la vie de campus et à créer une mixité sociale et fonctionnelle ;

- et enfin, condition sine qua non de réussite, la mise en place d*un processus d*information et de concer- tation auprès de tous les membres et acteurs de l*Université, pour une compréhension partagée des en- jeux et un apprentissage des comportements responsables.

Aussi, l*Université de Bordeaux entend-elle élaborer un agenda 21 et faire de son campus un site d*expérimentation permettant de développer des approches innovantes à partir des compétences des laboratoires.

L*étude « Initiatives campus verts » est téléchargeable sur le site www.univ-bordeaux.fr

Contacts presse Université de Bordeaux

Anne SEYRAFIAN . Norbert LOUSTAUNAU . T 33 (0)5 56 33 80 84 . [email protected] Contact Nobatek-Ecocampus

Julie CREPIN, chef de projet . T 33 (0)5 56 84 63 72 . [email protected]

L*Université de Bordeaux

Vers un nouveau modèle d"Université DURABLE

(2)

Then a miracle occurs...

(3)

Plan du cours

๏ Introduction

➡ Classification des circuits (re)configurables

➡ Evolution des circuits FPGAs,

➡ Contexte d’utilisation des FPGAs.

๏ Architecture de circuit FPGA

➡ L’élément configurable de base,

➡ Les blocs dédiés (DSP, RAM).

๏ La configuration d’un circuit FPGA

➡ Différentes techniques,

➡ Différents types.

๏ Les fabricants de circuit FPGA

➡ Les circuits FPGA chez Xilinx,

➡ Les circuits FPGA chez Altera.

๏ Le flot de conception d’un circuit FPGA

(4)

Les cibles technologiques  

d’intégration numérique

(5)

Evolution de la complexité d’intégration

(6)

Les principales familles de cibles technologiques

Architecture  dédiée Implanta2on  logicielle Codesign  &  so8cores

Circuit  ASIC  préconçu  et  fondu  par  un  2ers

(7)

Introduction aux FPGA

(8)

Classification des circuits reconfigurables

๏ Les circuits programmables ne sont pas vraiment des ASICs :

➡ Circuits directement disponibles chez le fournisseur,

➡ Développement et programmation «rapide»,

➡ Pas besoin de fonderie.

๏ Différentes familles de circuit (re)programmable :

➡ PLD (Programmable Logic Device); circuits de 100-200 portes,

➡ CPLD (Complex Programmable Logic Device); circuits de 5-10 K portes,

➡ FPGA (Field Programmable Gate Array); circuits atteignant +10

6

portes.

๏ Différentes techniques de programmation :

➡ Cellules à fusible ou à antifusible,

➡ Mémoire effaçable électriquement (EEPROM & Flash EPROM),

➡ Mémoire SRAM.

(9)

Classification des circuits reconfigurables

๏ Différentes techniques de programmation :

➡ Cellules à fusible ou à antifusible,

➡ Mémoire effaçable électriquement
 (EEPROM & Flash EPROM),

➡ Mémoire SRAM.

Antifusible EPROM SRAM

Facilité de programmation faible très importante importante

Densité faible élevé très élevé

Famille de circuit FPGA PLD CPLD CPLD FPGA

Reprogrammabilité non oui oui

(10)

Les circuits configurables de type PAL

๏ La plupart des PALs ont la structure suivante :

➡ Un ensemble d’opérateurs « ET » sur lesquels viennent se connecter les variables d’entrée et leurs compléments,

➡ Un ensemble d’opérateurs « OU » sur lesquels les sorties des opérateurs « ET » sont connectées,

➡ Une éventuelle structure de sortie (Portes inverseurs, logique 3-états, registres, etc.).

๏ Les interconnexions de ces matrices sont programmables.

a b

: Fusible intact

Matrice des ET

OU Sorties

PAL

(11)

Les circuits configurables de type CPLD

๏ Un CPLD se présente comme un ensemble de fonctions de type PAL reliées à l’aide d’une matrice de connexion.

๏ Intérêt

➡ Développement de circuits numériques de faible complexité (5000 à 100000 portes)

๏ Exemples

➡ Altera MAXII: 240 à 2200 éléments logiques,

➡ Xilinx CoolRunner2: 32 à 512 macrocells.

Bloc logique

Bloc logique

Bloc entrée sortie

Bloc entrée sortie Bloc

entrée sortie

Bloc entrée sortie

Bloc logique

Bloc logique

Bloc logique

Bloc logique

Bloc logique

Bloc logique Matrice

de connexion

(12)

Les circuits configurables de type FPGA

๏ Les FPGAs comprennent:

➡ De nombreux modules logiques,

➡ Un réseau d’interconnexions entre les modules logiques,

‣ Réseau non centralisé.

๏ Deux familles de FPGA :

➡ FPGA de type SRAM (reprogrammable),

➡ FPGA de type antifusible
 (non reprogrammable).

๏ Les FPGAs ont des capacités d’intégration élevées.

Bloc logique

Bloc logique

Bloc logique

Bloc logique

Bloc logique

Bloc logique

Bloc logique

Bloc logique Bloc

logique

Bloc logique

Bloc logique

Bloc logique

Bloc logique

Bloc logique

Bloc logique

Bloc logique

Bloc entrée sortie

Bloc entrée sortie Bloc

entrée sortie

Bloc entrée sortie

Canaux de routage Connexions

programmables

(13)

Intérêt des circuits (re)configurables de type FPGA

(14)

Comparaison des circuits (re)configurables versus les ASICs

(15)

Evolution des technologies d’intégrations utilisées 

! !

FPGA!

µ

(16)

Evolution des couts de production des circuits

(17)

Contexte d’utilisation des circuits FPGA

Contexte d utilisation en

production grande série

(18)

Les domaines d’utilisation des FPGA dans l’industrie (2008)

(19)

Classement des vendeurs de FPGA

(20)

La classification des FPGA

๏ Classification suivant le mode de configuration,

➡ FPGA à anti-fusibles (programmable),

‣ le circuit est figé physiquement au niveau des connexions

➡ FPGA à cellules SRAM (reprogrammable),

‣ le circuit est reconfigurable à l’aide des cellules SRAM (modification des

interconnexions).

๏ Classification en fonction de l’architecture interne,

➡ Architecture en ilots de calcul,

➡ Architecture hiérarchique,

Architecture logarithmique,

(21)

Architecture interne des circuits FPGA

(22)

Etude des FPGA de chez Xilinx

(23)

Les domaines d’utilisation des FPGA dans l’industrie (2008)

Les  éléments  fonc;onnels  (logique,  mémoire,  IO)  sont  regroupés  sous  forme  de  matrice.

(24)

La régularité de l’architecture vue du silicium

XILINX  Virtex  II

Intel  Processor  Quad-­‐core  Haswell

XILINX  Virtex  4 Intel  Processor  Core-­‐i7

NVIDIA  K110  GPU

(25)

Technologie des circuits numériques - les circuits FPGA

Bertrand LE GAL 2013-2014

Elément logique de base (cellule élémentaire), qu’est ce donc ?

25

La cellule ou

Comment générer une fonction logique quelconque?

OUT

DFF pour la logique séquentielle

LUT ≡ mémoire

Cellule = LUT (+ bascule)

In 0 In 1

In 2

In 3

configuration

In1 LUT4

In2 In3 in4

Modes d’utilisation de la LUT :

• Additionneur 1 bit : 1LUT4 = 2 LUT3 (résultat,retenue)

• Mémoire RAM 16 bits (Xilinx)

• Registre à décalage (Xilinx)

mémoire

ad1 ad2 ad3

๏ Etudions le principe sur une cellule de base, composée d’une LUT4 entrées et d’une Flip-Flop.

➡ Les cellules réelles sont un peu plus complexes...

๏ Une LUT4 à 4 entrées et correspond à une mémoire de 16 bits.

๏ Comment implanter une fonction logique quelconque sur cette

architecture ?

(26)

Cas d’étude [O = E3.E1 + E2.E0 + E1.E0] - Solution ASIC

Equation logique

Le processus d’intégration est

contraint par la bibliothèque de portes logiques fournie par le fondeur.

+

(27)

Cas d’étude [O = E3.E1 + E2.E0 + E1.E0] - Solution FPGA

Equation logique

Le processus d’intégration est contraint par les ressources disponibles dans le FPGA.

LUT à 4 entrées = mémoire 16-bits. !

Il faut précalculer le contenu de la mémoire !

à partir de la fonction logique à intégrer.

Peu importe la complexité de la fonction logique (4 entrées max.) => 1 LUT4.

Toutefois... le pendant est une fonction !

logique à 1 entrée (i.e. NOT) => 1 LUT4.

(28)

Cas d’étude [O = E3.E1 + E2.E0 + E1.E0] - Solution FPGA

Equation logique

(29)

Cas d’étude [O = E3.E1 + E2.E0 + E1.E0] - Solution FPGA

Equation logique

(30)

Cas d’étude [O = E3.E1 + E2.E0 + E1.E0] - Solution FPGA

0 0 1 1 1

0 1 0 0 1

0 1 0 1 1 B

0 1 1 0 0

0 1 1 1 1

1 0 0 0 0

1 0 0 1 1 E

1 0 1 0 1

1 0 1 1 1

1 1 0 0 1

1 1 0 1 1 B

1 1 1 0 0

1 1 1 1 1

To obtain the INIT attribute, we read the output states in groups of 4 from the bottom up and convert them into hex characters. The first four bits (marked in blue in the table) are 1011, so the first hex character of the INIT attribute is "B".

We can see that the next four bits (marked in red) are 1110, so the next character in the INIT attribute is "E". The complete INIT attribute is BEB8.

[ Back to top ]

Applying INIT in VHDL source files Applying INIT in VHDL source files

The easiest way to set the INIT attribute is by applying a generic to the instantiated LUT:

LIBRARY UNISIM;

USE UNISIM.Vcomponents.ALL;

. .

ARCHITECTURE ArcTop OF top IS

BEGIN

i_LUT : LUT4 GENERIC MAP(

INIT => x"BEB8"

)

PORT MAP(

I0 => I0, I1 => I1, I2 => I2, I3 => I3, O => O );

(31)

Utilité de la cellule de base étudiée

๏ Que peut on faire avec une LUT à 4 entrées ?

➡ Un additionneur 1-bit,

➡ Une mémoire 16-bits,

➡ Un registre à décalage,

➡ Toutes les équations logiques
 à 4 entrées.

๏ Limitations ?

➡ Les équations logiques à n entrées
 (n > 4).

‣ Nécessaire d’utiliser plusieurs LUT4 interconnectée


les unes avec les autres...


(32)

Pourquoi utiliser des LUTs à 4 entrées ?

Séminaires COMELEC page 18

Combien d'entrées pour la LUT ?

Les FPGAs page 18

Nb LUTs NB LUTs sur le chemin critique interconnexion

Temps critique

Elias Ahmed, Jonathan Rose: The effect of LUT and cluster size on deep-submicron FPGA performance and density. IEEE Trans. VLSI Syst. 12(3): 288-298 (2004)

Si beaucoup d’entrées : mémoire énorme mais moins de LUT sur le chemin de calcul . Quel est le meilleur compromis ?

Séminaires COMELEC page 18

Combien d'entrées pour la LUT ?

Les FPGAs page 18

Nb LUTs NB LUTs sur le chemin critique interconnexion

Temps critique

Elias Ahmed, Jonathan Rose: The effect of LUT and cluster size on deep-submicron FPGA performance and density. IEEE Trans. VLSI Syst. 12(3): 288-298 (2004)

Si beaucoup d’entrées : mémoire énorme mais moins de LUT sur le chemin de calcul . Quel est le meilleur compromis ?

Toutefois, aujourd’hui les LUTs ont [6, 8] entrées et [1, 2] sorties

(33)

Evolution de la taille des LUTs chez Xilinx

Le  passage  des  LUT4  (Virtex-­‐4)  aux  LUT6   (Virtex-­‐5,6,7)  a  été  réalisée  an  

factorisant  2  LUTs  de  taille  inférieure.

Réduc;on  du  nombre  d’étages  de  logique   (LUT)  a  traverser  pour  implanter  des   fonc;ons  complexes  (chemin  cri;que).

Risque  d'inefficacité  silicium  :  une  LUT6   pour  implanter  une  fonc;on  B  =  non  A.  

=>  Fonc;onnement  en  mode  2  sor;es.

(34)

Interconnexion des éléments configurables

WILTON' UNIVERSAL'

Les$blocs'de'connexions$assurent$la$

connexion$des$éléments$configurables$

$

Les$matrices'de'connexions$assurent$la$

connexion$des$blocs$de$connexions'

BC':'

les'blocs'de'connexions$$ MC':'

les'matrices'de' connexions$$

19$

(35)

Interconnexion des éléments configurables

Le réseau de routage

L architecture peut être vue comme un tableau de

matrice de connexion

Ressources de routage hiérarchiques

Il existe également des ressources de routage

(36)

La structure interne des FPGA est plus complexe

๏ En théorie, toutes les fonctions numériques peuvent être

intégrées à l’aide de LUTx,

➡ Certaines implantations s'avèrent toutefois inefficaces !

๏ Ajout de ressources

spécialisées dans le FPGA,

➡ Bloc mémoire (RAM 18/36kbits)

➡ Blocs DSP (MAC)

๏ D’autres blocs dédiés peuvent être disponibles,

➡ PLL, PCIe, Ethernet, etc.

(37)

Cellule élémentaire

➡ Un générateur de fonction LUT (Look Up Table),

➡ Une chaîne de propagation rapide de la retenue,

➡ Une bascule D.

Décomposition de l’élément logique de base (CLB)

๏ Le bloc de logique CLB (Configurable Logic Block) correspond à l’élément logique (configurable) de base chez Xilinx.

๏ Un CLB est construit à partir de cellules élémentaires (de 2 à 4

cellules en règle générale).

(38)

Terminologie dans la structure des FPGA de chez Xilinx

Xilinx FPGA

structure

(39)

Terminologie dans la structure des FPGA de chez Xilinx

Configurable Logic Block

(CLB)

Xilinx FPGA

structure

(40)

Terminologie dans la structure des FPGA de chez Xilinx

Configurable Logic Block

(CLB)

Xilinx FPGA structure

2 www.xilinx.com WP405 (v1.0) March 6, 2012

Introduction

Introduction

The configurable logic block (CLB) is the core of the logic structure of Xilinx FPGAs.

Within a CLB reside slices that consist of look-up tables (LUTs), carry chains, and registers. These slices can be configured to perform logical functions, arithmetic functions, memory functions, and shift register functions. Over the years, the quantity of resources within a CLB has evolved to continuously provide the optimum capability at the right cost. The original Virtex® and Spartan®-II architectures, which were introduced around the turn of the millennium, provided a CLB consisting of two slices, where a slice contained two four-input LUTs and two registers. Since then, a slice has changed significantly—in 7 series FPGAs, a slice consists of four six-input LUTs (LUT6) and eight registers, as shown in Figure 1.

X-Ref Target - Figure 1

Figure 1: Slice Architecture in 7 Series FPGAs

WP405_01_100711

LUT Slice

Slice

(M ou L)

(41)

Terminologie dans la structure des FPGA de chez Xilinx

Configurable Logic Block

(CLB)

Xilinx FPGA structure

2 www.xilinx.com WP405 (v1.0) March 6, 2012

Introduction

Introduction

The configurable logic block (CLB) is the core of the logic structure of Xilinx FPGAs.

Within a CLB reside slices that consist of look-up tables (LUTs), carry chains, and registers. These slices can be configured to perform logical functions, arithmetic functions, memory functions, and shift register functions. Over the years, the quantity of resources within a CLB has evolved to continuously provide the optimum capability at the right cost. The original Virtex® and Spartan®-II architectures, which were introduced around the turn of the millennium, provided a CLB consisting of two slices, where a slice contained two four-input LUTs and two registers. Since then, a slice has changed significantly—in 7 series FPGAs, a slice consists of four six-input LUTs (LUT6) and eight registers, as shown in Figure 1.

X-Ref Target - Figure 1

Figure 1: Slice Architecture in 7 Series FPGAs

WP405_01_100711

LUT Slice

Slice (M ou L)

Slice Architecture in 7 Series FPGAs

WP405 (v1.0) March 6, 2012 www.xilinx.com 3

Slice Architecture in 7 Series FPGAs

All 7 series FPGA families (Artix™-7, Kintex™-7, and Virtex-7 devices) use the same logic architecture: CLBs consisting of two slices. Slices in the 7 series FPGA architecture come in two varieties—those that are capable of implementing logical, shift register, and memory functions in the LUT, called SLICEM, and those that can only implement logical functions in the LUT, called SLICEL. Employing this strategy of full feature SLICEM combined with reduced feature SLICEL enables the optimum capability and performance while maintaining low cost and low power. The 7 series FPGA slice architecture is based closely on the slice architecture introduced in the Virtex-6 and Spartan-6 families. The similarity between the Virtex-6, Spartan-6, and 7 series FPGA slice architecture provides an easy migration path for existing designs and IP into 7 series FPGAs; designers can migrate their designs to the latest features and highest performance, lowest power devices with minimal redesign effort.

Additionally, using the same scalable, optimized architecture for all 7 series FPGAs allows designs originally targeting one 7 series FPGA family to be ported easily to another 7 series FPGA family.

Slices are combined in a CLB in pairs with either two SLICEL or one SLICEL with one SLICEM. The 7 series FPGAs are built on the column-based ASMBL™ architecture, which allows for the easy placement of resources where the designer needs them. In this case, the memory-capable slices are most prevalent in proximity to the columns of DSP slices, providing designers storage for coefficients close to where they are required. Xilinx design tools have full knowledge of the relative placement of resources and intelligently and automatically map a design to the resources in the most efficient way while adhering to any constraints specified by the user.

Figure 2 shows how the LUT and registers are arranged in relation to one another.

Figure 2 only includes one LUT and its associated two registers and omits the carry chain. In a full slice, there are four LUTs and eight registers.

The 6-input LUTs are capable of implementing any Boolean logical function that is a product of six input signals but can also be split into two five-input LUTs—as long as the two functions share common inputs. Additionally, a LUT in a SLICEM can be configured as 64 bits of Distributed RAM or up to 32-bit Shift Register Logic (SRL) functions. For more information, see UG474, 7 Series FPGAs Configurable Logic Block User Guide.

X-Ref Target - Figure 2

Figure 2: Layout of 6-Input LUT and Two Registers within a Slice

WP405_02_011912

6-Input

LUT Register

O6 O5

D Q

CE CLK S/R

Register

D Q

CE CLK S/R

Element

de base

(42)

Les différents types d’architectures de FPGA

Les  réseaux  de  routage  d’une   architecture  hiérarchique  

dépendent  du  niveau  de  hiérarchie.  

Architecture  u;lisée  dans  les  FPGA   !

de  chez  Altera  et  Laace Architecture  hiérarchique  où  chaque  

niveau  i  correspond  à  une  matrice  de  4

2i

  cellules.  

Type  d’architecture  employé  chez  Xilinx. !

(43)

Back to the Past: votre premier TP de VHDL cette année...

Ges;on  du  bouton  rota;f  

et  allumage  des  LEDs

(44)

Observation des résultats post-synthèse (RTL)

(45)

Observation des résultats post-synthèse (technologie)

(46)

Observation des résultats post-placement & routage (1/2)

Logic  U;liza;on:  

   Number  of  Slice  Flip  Flops:          39  out  of      9,312        1%  

(47)

Observation des résultats post-placement & routage (1/2)

Logic  U;liza;on:  

(48)

Observation des résultats post-placement & routage (1/2)

Logic  U;liza;on:  

   Number  of  Slice  Flip  Flops:          39  out  of      9,312        1%  

(49)

Observation des résultats post-placement & routage (2/2)

(50)

Observation des résultats post-placement & routage (2/2)

(51)

Observation des résultats post-placement & routage (2/2)

(52)

Etude des FPGA de chez Altera

(53)

Décomposition de l’élément logique configurable chez Altera

L’associa;on  chez  Xilinx  d’une  LUTn  +   d’une  Flip-­‐flop  cons;tue  une  cellule  de  

base.  

Chez  Altera,  la  cellule  de  base  est  plus   !

complexe:  ALUT  +  2  registers  +  2  FA

Adapta;ve  LUT:  une  LUT8  à  plusieurs   sor;es  qui  peut  être  découpée  en  LUTs  

de  tailles  inférieures  pour  maximiser  

l’occupa;on.

(54)

Technologie des circuits numériques - les circuits FPGA

Bertrand LE GAL 2013-2014

Décomposition de l’élément logique configurable chez Altera

47

Altera Corporation Stratix II Performance and Logic Efficiency Analysis

5 represent the Stratix II family’s basic building blocks of logic, providing efficient implementation of user logic functions. Each ALM can be configured to perform combinational logic functions or logic-and- arithmetic operations.

ALM Adaptive Combinational Logic Support

Each ALM contains a variety of look-up table (LUT)-based resources that can be adaptively divided between two ALUTs. With up to eight inputs to the combinational logic block, one ALM can implement various combinations of two functions. Table 2 shows a summary of various supported combinational logic configurations in an ALM. Refer to the Stratix II Device Handbook for detailed architecture descriptions.

Table 2. ALM Adaptive Combinational Logic Configurations

Configuration Description

One Stratix II ALM can be configured to implement two independent four-input (or smaller) LUTs. This configuration can be viewed as the “backward-compatibility” mode. Designs that are optimized for the traditional four-input LUT (4-LUT) FPGAs can easily be migrated to the Stratix II family.

One Stratix II ALM can be configured to implement a five-input LUT (5-LUT) and a three-input LUT (3-LUT).

The inputs to the two LUTs are independent of each other. The 3-LUT can be used to implement any logic function that has three or fewer inputs. Therefore, 5-LUT-2-LUT is also allowed.

One Stratix II ALM can be configured to implement a five-input LUT and a four-input LUT. One of the inputs is shared between the two LUTs. The 5-LUT has up to four independent inputs. The 4-LUT has up to three independent inputs. The sharing of inputs between LUTs is very common in FPGA designs, and the Quartus II software automatically seeks logic functions that are structured in this manner.

One Stratix II ALM can be configured to implement two five-input LUTs (5-LUTs). Two of the inputs between the LUTs are common, and up to three independent inputs are allowed for each 5-LUT.

6-LUT

A Stratix II ALM can implement any six-input function (6-LUT). If there are two six-input functions that have the same logic operation and four shared inputs, then the two six-input functions can be implemented in one Stratix II ALM

For example, a 4x2 crossbar switch that has four data input lines and two sets of unique select signals requires four LEs in the Stratix family. In the Stratix II family, this function only requires one ALM. Another example is a six-input AND gate. An ALM can implement two six-input AND gates that have four common inputs. The same function would require three LEs if implemented in a Stratix device.

One Stratix II ALM in the extended mode can implement a subset of a seven-variable function. The Quartus II software automatically recognizes the applicable seven-input function and fits it into an ALM. Refer to the Stratix II Device Handbook for detailed information about the types of seven-input functions that can be implemented in an ALM.

Altera Corporation Stratix II Performance and Logic Efficiency Analysis

Research has shown that wider LUTs provide better performance, while narrower LUTs provide better logic efficiency. Stratix II ALMs offer not only the desired performance, but also adaptively accommodate the logic functions with different numbers of inputs to provide exceptional logic efficiency that is 25%

better than prior FPGAs.

An analysis of a customer design suite similar to that benchmarked in Figure 3 shows that the LUTs’ size in a synthesized design is widely spread between LUTs with one input to seven inputs. See Figure 4. An FPGA’s logic structure of fixed-size four-input LUTs is quite wasteful when implementing logic functions that do not have exactly four variables. The adaptive nature of the ALMs can minimize the logic resource inefficiency caused by partially utilized LUTs in a fixed-size LUT FPGA architecture. See Figure 5.

Figure 4. LUT Distribution Analysis

7-Input LUT 6-Input LUT 7%

12%

5-Input LUT 29%

4-Input LUT 18%

3-Input LUT 16%

2-Input LUT 13%

1-Input LUT 7%

Figure 5. Five-Variable Function Implementation Comparison Between a Fixed-Size Four-Input LUT FPGA & Stratix II Devices

(55)

Intérêt de la structure modulaire de la LUT8 de chez Altera

Altera Corporation Stratix II Performance and Logic Efficiency Analysis

7 Research has shown that wider LUTs provide better performance, while narrower LUTs provide better logic efficiency. Stratix II ALMs offer not only the desired performance, but also adaptively accommodate the logic functions with different numbers of inputs to provide exceptional logic efficiency that is 25%

better than prior FPGAs.

An analysis of a customer design suite similar to that benchmarked in Figure 3 shows that the LUTs’ size in a synthesized design is widely spread between LUTs with one input to seven inputs. See Figure 4. An FPGA’s logic structure of fixed-size four-input LUTs is quite wasteful when implementing logic functions that do not have exactly four variables. The adaptive nature of the ALMs can minimize the logic resource inefficiency caused by partially utilized LUTs in a fixed-size LUT FPGA architecture. See Figure 5.

Figure 4. LUT Distribution Analysis

7-Input LUT 6-Input LUT 7%

12%

5-Input LUT 29%

4-Input LUT 18%

3-Input LUT 16%

2-Input LUT 13%

1-Input LUT 7%

Figure 5. Five-Variable Function Implementation Comparison Between a Fixed-Size Four-Input LUT FPGA & Stratix II Devices

White Paper

Stratix II Performance and Logic Efficiency Analysis

Introduction

Pursuing higher performance and density of FPGA devices by migrating to a smaller-geometry silicon process presents challenges with power consumption issues. The power consumption of a device based on the 90-nm process rises considerably along with the increase in performance and density. To obtain the desired performance and density without the power consumption penalty requires an improvement to the logic structure in the FPGAs.

Research conducted by E. Ahmed and J. Rose concluded that FPGA logic fabric with wider look-up tables (LUTs) provides better performance, while FPGA logic fabric with narrower LUTs is more cost-effective [1]. Other researchers reached similar conclusions [2] [3] [4] [5] [6]. Figure 1 shows the conceptual view of the relative performance and logic efficiency comparisons of FPGA logic fabrics with different LUT sizes.

Figure 1. Conceptual View of Relative Performance, Cost Effectiveness and LUT Input Size Comparison

The logic structure of the 90-nm-based Stratix® II family addresses the power consumption challenges at 90 nm while delivering an average 50% boost in performance. The power consumption challenge is further mitigated by a 25% logic efficiency improvement on average because the same design takes much less

Virtex-­‐5  LUT6

La  LUT8  de  chez  Altera  permet  un   meilleur  taux  d’u;lisa;on  des  ressources  

versus  les  LUT4  et  LUT6  (en  théorie).

(56)

Technologie des circuits numériques - les circuits FPGA

Bertrand LE GAL 2013-2014

Etude plus fine de la LUT6 de chez Xilinx

49

Slice Architecture in 7 Series FPGAs

WP405 (v1.0) March 6, 2012 www.xilinx.com 3

Slice Architecture in 7 Series FPGAs

All 7 series FPGA families (Artix™-7, Kintex™-7, and Virtex-7 devices) use the same logic architecture: CLBs consisting of two slices. Slices in the 7 series FPGA

architecture come in two varieties—those that are capable of implementing logical, shift register, and memory functions in the LUT, called SLICEM, and those that can only implement logical functions in the LUT, called SLICEL. Employing this strategy of full feature SLICEM combined with reduced feature SLICEL enables the optimum capability and performance while maintaining low cost and low power. The 7 series FPGA slice architecture is based closely on the slice architecture introduced in the Virtex-6 and Spartan-6 families. The similarity between the Virtex-6, Spartan-6, and 7 series FPGA slice architecture provides an easy migration path for existing designs and IP into 7 series FPGAs; designers can migrate their designs to the latest features and highest performance, lowest power devices with minimal redesign effort.

Additionally, using the same scalable, optimized architecture for all 7 series FPGAs allows designs originally targeting one 7 series FPGA family to be ported easily to another 7 series FPGA family.

Slices are combined in a CLB in pairs with either two SLICEL or one SLICEL with one SLICEM. The 7 series FPGAs are built on the column-based ASMBL™ architecture, which allows for the easy placement of resources where the designer needs them. In this case, the memory-capable slices are most prevalent in proximity to the columns of DSP slices, providing designers storage for coefficients close to where they are required. Xilinx design tools have full knowledge of the relative placement of resources and intelligently and automatically map a design to the resources in the most efficient way while adhering to any constraints specified by the user.

Figure 2 shows how the LUT and registers are arranged in relation to one another.

Figure 2 only includes one LUT and its associated two registers and omits the carry chain. In a full slice, there are four LUTs and eight registers.

The 6-input LUTs are capable of implementing any Boolean logical function that is a product of six input signals but can also be split into two five-input LUTs—as long as the two functions share common inputs. Additionally, a LUT in a SLICEM can be configured as 64 bits of Distributed RAM or up to 32-bit Shift Register Logic (SRL) functions. For more information, see UG474, 7 Series FPGAs Configurable Logic Block User Guide.

X-Ref Target - Figure 2

Figure 2: Layout of 6-Input LUT and Two Registers within a Slice

WP405_02_011912

6-Input

LUT Register

O6 O5

D Q

CE CLK S/R

Register

D Q

CE CLK S/R

Common Slice Resource Usage

Common Slice Resource Usage

Versatility is fundamental to programmable logic; designers can use the FPGA slice resources in numerous ways depending on their objectives.

The architecture allows the LUTs to be used independently from the registers. The Bypass (AX/BX/CX/DX) input to the slice allows access to the D input of the registers without going through the LUT, and the combinatorial signals are fed to the output of the slice (Figure 3).

In addition to driving the register directly, the Bypass input can be used to drive the carry chain.

The LUT outputs can feed directly into the D input of the associated registers using the flip-flop multiplexers (Figure 4). The O5 LUT output can be connected to the input of either register (Figure 4), whereas the O6 LUT output can only connect to one of the registers (Figure 2).

X-Ref Target - Figure 3

Figure 3: Use of Bypass Input to Access the Slice Register

X-Ref Target - Figure 4

Figure 4: Using Both Available Registers

WP405_03_011912

6-input LUT

O6

Bypass

Register

D Q

CE CLK S/R

WP405_04_100711

6-Input

LUT Register

O6 O5

D Q

CE CLK S/R

Register

D Q

CE CLK S/R

4 www.xilinx.com WP405 (v1.0) March 6, 2012

Common Slice Resource Usage

Common Slice Resource Usage

Versatility is fundamental to programmable logic; designers can use the FPGA slice resources in numerous ways depending on their objectives.

The architecture allows the LUTs to be used independently from the registers. The Bypass (AX/BX/CX/DX) input to the slice allows access to the D input of the registers without going through the LUT, and the combinatorial signals are fed to the output of the slice (Figure 3).

In addition to driving the register directly, the Bypass input can be used to drive the carry chain.

The LUT outputs can feed directly into the D input of the associated registers using the flip-flop multiplexers (Figure 4). The O5 LUT output can be connected to the input of either register (Figure 4), whereas the O6 LUT output can only connect to one of the registers (Figure 2).

X-Ref Target - Figure 3

Figure 3: Use of Bypass Input to Access the Slice Register

X-Ref Target - Figure 4

Figure 4: Using Both Available Registers

WP405_03_011912

6-input LUT

O6

Bypass

Register

D Q

CE CLK S/R

WP405_04_100711

6-Input

LUT Register

O6 O5

D Q

CE CLK S/R

Register

D Q

CE CLK S/R Common Slice Resource Usage

WP405 (v1.0) March 6, 2012 www.xilinx.com 5

Logical functions that do not share inputs can be created within a single LUT. The A6 input to the LUT is tied High to enable dual LUT mode, leaving the remaining five inputs to the LUT available for independent logical functions. For example, a two-input function and a three-input function that do not share inputs can be packed in the same LUT (Figure 5). If registering the logical outputs, the registers must share the same control signals.

The wide multiplexers, F7 and F8, use the bypass inputs to switch between two LUT6 outputs, providing a means of implementing functions wider than six inputs in a single level of CLBs.

X-Ref Target - Figure 5

Figure 5: Two Independent Logical Functions

X-Ref Target - Figure 6

Figure 6: Wide Logic Functions Using F7 and F8 Multiplexers

WP405_05_100711

6-Input

LUT Register

O6 Inputs 1:3

O5

D Q

CE CLK S/R

Register

D Q

CE CLK S/R Inputs 4:5

Slice

LUT

LUT

LUT

LUT

F7 MUX

F7 MUX

F8 MUX

WP405_06_013012

Common Slice Resource Usage

WP405 (v1.0) March 6, 2012 www.xilinx.com 5

Logical functions that do not share inputs can be created within a single LUT. The A6 input to the LUT is tied High to enable dual LUT mode, leaving the remaining five inputs to the LUT available for independent logical functions. For example, a two-input function and a three-input function that do not share inputs can be packed in the same LUT (Figure 5). If registering the logical outputs, the registers must share the same control signals.

The wide multiplexers, F7 and F8, use the bypass inputs to switch between two LUT6 outputs, providing a means of implementing functions wider than six inputs in a single level of CLBs.

X-Ref Target - Figure 5

Figure 5: Two Independent Logical Functions

X-Ref Target - Figure 6

Figure 6: Wide Logic Functions Using F7 and F8 Multiplexers

WP405_05_100711

6-Input

LUT Register

O6 Inputs 1:3

O5

D Q

CE CLK S/R

Register

D Q

CE CLK S/R Inputs 4:5

Slice

LUT

LUT

LUT

LUT

F7 MUX

F7 MUX

F8 MUX

WP405_06_013012

La  LUT6  de  chez  Xilinx  peut  se  

décomposer  en  2  LUT5  si  les  entrées  

sont  iden;ques  (différent  chez  Altera).

(57)

Technologie des circuits numériques - les circuits FPGA

Bertrand LE GAL 2013-2014

2–2 Altera Corporation

Stratix II Device Handbook, Volume 1 May 2007

Functional Description

Each Stratix II device I/O pin is fed by an I/O element (IOE) located at the end of LAB rows and columns around the periphery of the device.

I/O pins support numerous single-ended and differential I/O standards.

Each IOE contains a bidirectional I/O buffer and six registers for registering input, output, and output-enable signals. When used with dedicated clocks, these registers provide exceptional performance and interface support with external memory devices such as DDR and DDR2 SDRAM, RLDRAM II, and QDR II SRAM devices. High-speed serial interface channels with dynamic phase alignment (DPA) support data transfer at up to 1 Gbps using LVDS or HyperTransportTM technology I/O standards.

Figure 2–1 shows an overview of the Stratix II device.

Figure 2–1. Stratix II Block Diagram

M512 RAM Blocks for Dual-Port Memory, Shift Registers, & FIFO Buffers

DSP Blocks for Multiplication and Full Implementation of FIR Filters

M4K RAM Blocks for True Dual-Port Memory & Other Embedded Memory Functions

IOEs Support DDR, PCI, PCI-X, SSTL-3, SSTL-2, HSTL-1, HSTL-2, LVDS, HyperTransport & other I/O Standards

IOEs IOEs IOEs IOEs IOEs IOEs IOEs IOEs IOEs IOEs IOEs IOEs IOEs IOEs IOEs IOEs IOEs

LABs LABs IOEs

LABs LABs LABs LABs LABs LABs LABs LABs LABs LABs LABs LABs LABs LABs LABs

LABs LABs IOEs

LABs LABs LABs LABs LABs LABs LABs LABs LABs LABs LABs LABs LABs LABs LABs

LABs LABs

LABs

IOEs IOEs

LABs

LABs LABs

LABs LABs

LABs LABs LABs LABs LABs LABs LABs LABs LABs LABs LABs LABs

LABs LABs

LABs LABs

LABs LABs LABs LABs LABs LABs

LABs LABs LABs LABs

LABs

LABs LABs

LABs LABs

LABs LABs

LABs LABs LABs LABs LABs LABs LABs LABs LABs

DSPBlock

M-RAM Block

Structuration du Stratix II, FPGA de chez Altera

50

Stratix II Architecture

The number of M512 RAM, M4K RAM, and DSP blocks varies by device along with row and column numbers and M-RAM blocks. Table 2–1 lists the resources available in Stratix II devices.

Logic Array Blocks

Each LAB consists of eight ALMs, carry chains, shared arithmetic chains, LAB control signals, local interconnect, and register chain connection lines. The local interconnect transfers signals between ALMs in the same LAB. Register chain connections transfer the output of an ALM register to the adjacent ALM register in an LAB. The Quartus®II Compiler places associated logic in an LAB or adjacent LABs, allowing the use of local, shared arithmetic chain, and register chain connections for performance and area efficiency. Figure 2–2 shows the Stratix II LAB structure.

Table 2–1. Stratix II Device Resources Device M512 RAM

Columns/Blocks M4K RAM

Columns/Blocks M-RAM

Blocks DSP Block

Columns/Blocks LAB

Columns LAB Rows

EP2S15 4 / 104 3 / 78 0 2 / 12 30 26

EP2S30 6 / 202 4 / 144 1 2 / 16 49 36

EP2S60 7 / 329 5 / 255 2 3 / 36 62 51

EP2S90 8 / 488 6 / 408 4 3 / 48 71 68

EP2S130 9 / 699 7 / 609 6 3 / 63 81 87

EP2S180 11 / 930 8 / 768 9 4 / 96 100 96

1  LAB  =  8  ALM

(58)

Technologie des circuits numériques - les circuits FPGA

Bertrand LE GAL 2013-2014

2–2 Altera Corporation

Stratix II Device Handbook, Volume 1 May 2007

Functional Description

Each Stratix II device I/O pin is fed by an I/O element (IOE) located at the end of LAB rows and columns around the periphery of the device.

I/O pins support numerous single-ended and differential I/O standards.

Each IOE contains a bidirectional I/O buffer and six registers for registering input, output, and output-enable signals. When used with dedicated clocks, these registers provide exceptional performance and interface support with external memory devices such as DDR and DDR2 SDRAM, RLDRAM II, and QDR II SRAM devices. High-speed serial interface channels with dynamic phase alignment (DPA) support data transfer at up to 1 Gbps using LVDS or HyperTransportTM technology I/O standards.

Figure 2–1 shows an overview of the Stratix II device.

Figure 2–1. Stratix II Block Diagram

M512 RAM Blocks for Dual-Port Memory, Shift Registers, & FIFO Buffers

DSP Blocks for Multiplication and Full Implementation of FIR Filters

M4K RAM Blocks for True Dual-Port Memory & Other Embedded Memory Functions

IOEs Support DDR, PCI, PCI-X, SSTL-3, SSTL-2, HSTL-1, HSTL-2, LVDS, HyperTransport & other I/O Standards

IOEs IOEs IOEs IOEs IOEs IOEs IOEs IOEs IOEs IOEs IOEs IOEs IOEs IOEs IOEs IOEs IOEs

LABs LABs IOEs

LABs LABs LABs LABs LABs LABs LABs LABs LABs LABs LABs LABs LABs LABs LABs

LABs LABs IOEs

LABs LABs LABs LABs LABs LABs LABs LABs LABs LABs LABs LABs LABs LABs LABs

LABs LABs

LABs

IOEs IOEs

LABs

LABs LABs

LABs LABs

LABs LABs LABs LABs LABs LABs LABs LABs LABs LABs LABs LABs

LABs LABs

LABs LABs

LABs LABs LABs LABs LABs LABs

LABs LABs LABs LABs

LABs

LABs LABs

LABs LABs

LABs LABs

LABs LABs LABs LABs LABs LABs LABs LABs LABs

DSPBlock

M-RAM Block

Logic Array Blocks

Figure 2–2. Stratix II LAB Structure

LAB Interconnects

The LAB local interconnect can drive ALMs in the same LAB. It is driven by column and row interconnects and ALM outputs in the same LAB.

Neighboring LABs, M512 RAM blocks, M4K RAM blocks, M-RAM blocks, or DSP blocks from the left and right can also drive an LAB's local interconnect through the direct link connection. The direct link

connection feature minimizes the use of row and column interconnects, providing higher performance and flexibility. Each ALM can drive 24 ALMs through fast local and direct link interconnects. Figure 2–3 shows the direct link connection.

Direct link interconnect from adjacent block

Direct link interconnect to adjacent block Row Interconnects of

Variable Speed & Length

Column Interconnects of Variable Speed & Length Local Interconnect is Driven

from Either Side by Columns & LABs,

& from Above by Rows Local Interconnect LAB

Direct link interconnect from adjacent block

Direct link interconnect to adjacent block

ALMs

Structuration du Stratix II, FPGA de chez Altera

50

Stratix II Architecture

The number of M512 RAM, M4K RAM, and DSP blocks varies by device along with row and column numbers and M-RAM blocks. Table 2–1 lists the resources available in Stratix II devices.

Logic Array Blocks

Each LAB consists of eight ALMs, carry chains, shared arithmetic chains, LAB control signals, local interconnect, and register chain connection lines. The local interconnect transfers signals between ALMs in the same LAB. Register chain connections transfer the output of an ALM register to the adjacent ALM register in an LAB. The Quartus®II Compiler places associated logic in an LAB or adjacent LABs, allowing the use of local, shared arithmetic chain, and register chain connections for performance and area efficiency. Figure 2–2 shows the Stratix II LAB structure.

Table 2–1. Stratix II Device Resources Device M512 RAM

Columns/Blocks M4K RAM

Columns/Blocks M-RAM

Blocks DSP Block

Columns/Blocks LAB

Columns LAB Rows

EP2S15 4 / 104 3 / 78 0 2 / 12 30 26

EP2S30 6 / 202 4 / 144 1 2 / 16 49 36

EP2S60 7 / 329 5 / 255 2 3 / 36 62 51

EP2S90 8 / 488 6 / 408 4 3 / 48 71 68

EP2S130 9 / 699 7 / 609 6 3 / 63 81 87

EP2S180 11 / 930 8 / 768 9 4 / 96 100 96

1  LAB  =  8  ALM

Références

Documents relatifs

The similarity between the Virtex-6, Spartan-6, and 7 series FPGA slice architecture provides an easy migration path for existing designs and IP into 7 series FPGAs; designers

Chacun des éléments logiques dans ce mode comporte quatre LUTs à 2 entrées et 1 sortie (les entrées des quatre LUTs sont les mêmes) et deux multiplexeurs 2 vers 1 connectés

[r]

Можно утверждать, что конструктивные решения, положенные в основу перспективных вычислитель- ных модулей на основе ПЛИС Xilinx Virtex UltraScale,

Pour toutes les adresses en envoi individuel, excepté celles qui commencent par la valeur binaire 000, les identifiants d’interface doivent être longs de 64 bits et être

[r]

[r]

La subordination ou ['enchâssement On désigne par (( proposition subordonnée ou enchâssée, (P2) une proposition qui est enchâssée à I'aide d'un marqueur