HAL Id: hal-02131970
https://hal.inria.fr/hal-02131970
Submitted on 16 May 2019
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
High-level synthesis and arithmetic optimizations
Yohann Uguen, Florent de Dinechin, Steven Derrien
To cite this version:
Yohann Uguen, Florent de Dinechin, Steven Derrien. High-level synthesis and arithmetic optimiza- tions. Compas’2016, Jul 2016, Lorient, France. �hal-02131970�
H IGH - LEVEL SYNTHESIS
AND ARITHMETIC OPTIMIZATIONS
Y OHANN U GUEN AND F LORENT DE D INECHIN AND S TEVEN D ERRIEN
C ONTEXT
• Computing with real numbers
• Design a hardware accelerator
• Targeting FPGAs
• Trade off between 1. performance 2. accuracy
3. resource usage 4. ease of use
M ERGING APPLICATION - SPECIFIC ARITHMETIC AND HLS
P
i
xi × yi
FloPoCo [1]
flopoco FPLargeAcc wex=8 wfx=23
msba=17 lsba=-50 maxmsbx=17
float sumOfProduct(float in1[N], float in2[N]) float sum = 0;
#pragma FPacc VAR=sum MaxAcc=100000 epsilon=1E-15 for (int i=0; i<N; i++)
sum+=in2[i+1];
sum+=in1[i]*in2[i-1];
return sum;
Operator
VHDL
HLS (Vivado)
Source-to-source transformations using GeCoS [2]
HLS (Vivado) C
C
C
VHDL VHDL VHDL
⊕ Low resource
⊕ Low latency
⊕ Accuracy control Difficult to use
Moderate resource Moderate latency
No accuracy control
⊕ Ease of use
⊕ Low resource
⊕ Low latency
⊕ Accuracy control
⊕ Ease of use
A CCUMULATOR
Based on Kulisch and Snyder accumulator [3]
Fixed-to-floatFloat-to-fixedSum
Exponent Mantissa Sign
Shifter
Negate
Negate
LZC + Shifter
Exponent Mantissa Sign
Shift value
Registers
Fixed-point sum
MaxMSBx we wf
MaxMSBX−LSBA+1
wA
wA
we wf
E XACT M ULTIPLIER
Exact multiplier
Sign Exponent Mantissa
Sign 1Exponent 1
Mantiss a 1
Sign 2Exponent 2
Mantiss a 2
we we
wf wf
1 1
1 we +1 2 x wf +2
S OURCE - TO - SOURCE TRANSFORMATIONS USING G E C O S
switch Node do case +
Launch recursively on incoming nodes end
case ×
Replace with exact multiplier
combined with tuned
Float-to-fixed operator
end
case Accumulation variable
Ignore end
otherwise Insert
Float-to-fixed node end
endsw
i
in1[]
- +
X
1
in2[]
+
+ 1
in2[]
sum sum
i
in1[]
- +
Exact multiplier 1
in2[]
Tuned Float-to-fixed
+
+ 1
in2[]
Float-to-fixed
sum sum
R ESULTS
Input values in [0, 1]
100K accumulations
FloPoCo’s Naive Transformed Operator Code Code
Accumulator width 67 24 67
LUTs 693 313 868
DSPs 2 5 2
Latency 100K 1000K 100K
Accuracy 24 bits 17 bits 24 bits
R EFERENCES
[1] de Dinechin, Florent et al.: An FPGA-specific Ap- proach to Floating-Point Accumulation and Sum-of- Products, FPT 2008
[2] Floc’h, Antoine et al.: GeCoS: A framework for proto- typing custom hardware design flows, SCAM 2013
[3] Kulisch, Ulrich and Snyder, Van: The Exact Dot Prod- uct As Basic Tool for Long Interval Arithmetic, Com- puting 2011