HAL Id: lirmm-01277374
https://hal-lirmm.ccsd.cnrs.fr/lirmm-01277374
Submitted on 22 Feb 2016
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of
sci-entific research documents, whether they are
pub-lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diffusion de documents
scientifiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
Synthesis of certified programs in fixed-point arithmetic,
and its application to linear algebra basic blocks
Mohamed Amine Najahi
To cite this version:
Mohamed Amine Najahi. Synthesis of certified programs in fixed-point arithmetic, and its application
to linear algebra basic blocks. RAIM: Rencontres Arithmétiques de l’Informatique Mathématique,
Apr 2015, Rennes, France. 7ème Rencontres Arithmétiques de l’Informatique Mathématique, 2015.
�lirmm-01277374�
7ème Rencontres Arithmétiques de l’Informatique Mathématique (RAIM2015)
Rennes, 7-9 april 2015
Synthesis of certified programs in fixed-point
arithmetic, and its application to
linear algebra basic blocks
Amine Najahi
Univ. Perpignan Via Domitia, DALI project-team
Univ. Montpellier 2, LIRMM, UMR 5506
CNRS, LIRMM, UMR 5506
D
Which arithmetic for computational tasks?
Floating-point computations
©
Easy and fast to implement
©
Easily portable
[IEEE754]
§
Requires dedicated hardware
§
Slow if emulated in software
Fixed-point computations
§
Tedious and time consuming to implement
• > 50
%
of design time
[Wil98]
©
Relies only on integer instructions
©
Efficient
Embedded systems targets
µ
-controllers
DSPs
FPGAs
→
have efficient integer instructions
Fixed-point arithmetic is well suited for embedded systems
But, how to make it
easy
,
fast
, and
numerically safe
to use by non-expert programmers?
Which arithmetic for computational tasks?
Floating-point computations
©
Easy and fast to implement
©
Easily portable
[IEEE754]
§
Requires dedicated hardware
§
Slow if emulated in software
Fixed-point computations
§
Tedious and time consuming to implement
• > 50
%
of design time
[Wil98]
©
Relies only on integer instructions
©
Efficient
Embedded systems targets
µ
-controllers
DSPs
FPGAs
→
have efficient integer instructions
Fixed-point arithmetic is well suited for embedded systems
Which arithmetic for computational tasks?
Floating-point computations
©
Easy and fast to implement
©
Easily portable
[IEEE754]
§
Requires dedicated hardware
§
Slow if emulated in software
Fixed-point computations
§
Tedious and time consuming to implement
• > 50
%
of design time
[Wil98]
©
Relies only on integer instructions
©
Efficient
Embedded systems targets
µ
-controllers
DSPs
FPGAs
→
have efficient integer instructions
Fixed-point arithmetic is well suited for embedded systems
But, how to make it
easy
,
fast
, and
numerically safe
to use by non-expert programmers?
Which arithmetic for computational tasks?
Floating-point computations
©
Easy and fast to implement
©
Easily portable
[IEEE754]
§
Requires dedicated hardware
§
Slow if emulated in software
Fixed-point computations
§
Tedious and time consuming to implement
• > 50
%
of design time
[Wil98]
©
Relies only on integer instructions
©
Efficient
Embedded systems targets
µ
-controllers
DSPs
FPGAs
→
have efficient integer instructions
Fixed-point arithmetic is well suited for embedded systems
Which arithmetic for computational tasks?
Floating-point computations
©
Easy and fast to implement
©
Easily portable
[IEEE754]
§
Requires dedicated hardware
§
Slow if emulated in software
Fixed-point computations
§
Tedious and time consuming to implement
• > 50
%
of design time
[Wil98]
©
Relies only on integer instructions
©
Efficient
Embedded systems targets
µ
-controllers
DSPs
FPGAs
→
have efficient integer instructions
Fixed-point arithmetic is well suited for embedded systems
But, how to make it
easy
,
fast
, and
numerically safe
to use by non-expert programmers?
The DEFIS approach
DEFIS (ANR, 2011-2015)
Goal:
develop techniques and tools to automate
fixed-point programming
Combines conversion and IP block synthesis
Ï
Ménard et al. (
CAIRN, Univ. Rennes
)
[MCCS02]
:
•
automatic float-to-fix conversion
Ï
Didier et al. (
PEQUAN, Univ. Paris
)
[LHD14]
:
•
code generation for the linear filter IP block
Ï
Our approach (
DALI, Univ. Perpignan
):
•
certified fixed-point synthesis for:
• Fine grained IP blocks:
dot-products,
polynomials, ...
• High level IP blocks:
matrix multiplication,
triangular matrix inversion, Cholesky decomposition
Long term objective:
code synthesis for matrix inversion
Im p lem en ta tio n to o ls In fras tru ct u re fo r t h e d es ig n o f f ix ed -p o in t s ys tem s Al go rit h m le ve l op tim iz at ion IWL Determination Dynamic Range evaluation FWL Determination Back-end S2S transfor-mation Application description Specific block generation Floating-point C code Accuracy evaluation B1 B5 B4 B3 B6 B2 System level optimization Accuracy constraint High level Synthesis Compiler Architecture Fixed-point C code Architecture model Validation & Optimization Parameterized IP blocks
The DEFIS approach
DEFIS (ANR, 2011-2015)
Goal:
develop techniques and tools to automate
fixed-point programming
Combines conversion and IP block synthesis
Ï
Ménard et al. (
CAIRN, Univ. Rennes
)
[MCCS02]
:
•
automatic float-to-fix conversion
Ï
Didier et al. (
PEQUAN, Univ. Paris
)
[LHD14]
:
•
code generation for the linear filter IP block
Ï
Our approach (
DALI, Univ. Perpignan
):
•
certified fixed-point synthesis for:
• Fine grained IP blocks:
dot-products,
polynomials, ...
• High level IP blocks:
matrix multiplication,
triangular matrix inversion, Cholesky decomposition
Long term objective:
code synthesis for matrix inversion
Im p lem en ta tio n to o ls In fras tru ct u re fo r t h e d es ig n o f f ix ed -p o in t s ys tem s Al go rit h m le ve l op tim iz at ion IWL Determination Dynamic Range evaluation FWL Determination Back-end S2S transfor-mation Application description Specific block generation Floating-point C code Accuracy evaluation B1 B5 B4 B3 B6 B2 System level optimization Accuracy constraint High level Synthesis Compiler Architecture Fixed-point C code Architecture model Validation & Optimization Parameterized IP blocks
The DEFIS approach
DEFIS (ANR, 2011-2015)
Goal:
develop techniques and tools to automate
fixed-point programming
Combines conversion and IP block synthesis
Ï
Ménard et al. (
CAIRN, Univ. Rennes
)
[MCCS02]
:
•
automatic float-to-fix conversion
Ï
Didier et al. (
PEQUAN, Univ. Paris
)
[LHD14]
:
•
code generation for the linear filter IP block
Ï
Our approach (
DALI, Univ. Perpignan
):
•
certified fixed-point synthesis for:
• Fine grained IP blocks:
dot-products,
polynomials, ...
• High level IP blocks:
matrix multiplication,
triangular matrix inversion, Cholesky decomposition
Long term objective:
code synthesis for matrix inversion
Im p lem en ta tio n to o ls In fras tru ct u re fo r t h e d es ig n o f f ix ed -p o in t s ys tem s Al go rit h m le ve l op tim iz at ion IWL Determination Dynamic Range evaluation FWL Determination Back-end S2S transfor-mation Application description Specific block generation Floating-point C code Accuracy evaluation B1 B5 B4 B3 B6 B2 System level optimization Accuracy constraint High level Synthesis Compiler Fixed-point C code Architecture model Validation & Optimization Parameterized IP blocks
The DEFIS approach
DEFIS (ANR, 2011-2015)
Goal:
develop techniques and tools to automate
fixed-point programming
Combines conversion and IP block synthesis
Ï
Ménard et al. (
CAIRN, Univ. Rennes
)
[MCCS02]
:
•
automatic float-to-fix conversion
Ï
Didier et al. (
PEQUAN, Univ. Paris
)
[LHD14]
:
•
code generation for the linear filter IP block
Ï
Our approach (
DALI, Univ. Perpignan
):
•
certified fixed-point synthesis for:
• Fine grained IP blocks:
dot-products,
polynomials, ...
• High level IP blocks:
matrix multiplication,
triangular matrix inversion, Cholesky decomposition
Long term objective:
code synthesis for matrix inversion
Im p lem en ta tio n to o ls In fras tru ct u re fo r t h e d es ig n o f f ix ed -p o in t s ys tem s Al go rit h m le ve l op tim iz at ion IWL Determination Dynamic Range evaluation FWL Determination Back-end S2S transfor-mation Application description Specific block generation Floating-point C code Accuracy evaluation B1 B5 B4 B3 B6 B2 System level optimization Accuracy constraint High level Synthesis Compiler Architecture Fixed-point C code Architecture model Validation & Optimization Parameterized IP blocks
Our road-map
How to generate certified fixed-point code for matrix inversion?
1.
Specify an arithmetic model
Ï
Contributions:
•
formalization of p and
/
2.
Build a synthesis tool, CGPE, for fine grained IP
blocks:
Ï
it adheres to the arithmetic model
Ï
Contributions:
•
implementation of the arithmetic model
3.
Build a second synthesis tool, FPLA, for algorithmic
IP blocks:
Ï
it generates code using CGPE
Ï
Contributions:
•
trade-off implementations for matrix multiplication
•
code synthesis for Cholesky decomposition and
triangular matrix inversion
Our road-map
How to generate certified fixed-point code for matrix inversion?
1.
Specify an arithmetic model
Ï
Contributions:
•
formalization of p and
/
2.
Build a synthesis tool, CGPE, for fine grained IP
blocks:
Ï
it adheres to the arithmetic model
Ï
Contributions:
•
implementation of the arithmetic model
3.
Build a second synthesis tool, FPLA, for algorithmic
IP blocks:
Ï
it generates code using CGPE
Ï
Contributions:
•
trade-off implementations for matrix multiplication
•
code synthesis for Cholesky decomposition and
triangular matrix inversion
Arithmetic
model
Our road-map
How to generate certified fixed-point code for matrix inversion?
1.
Specify an arithmetic model
Ï
Contributions:
•
formalization of p and
/
2.
Build a synthesis tool, CGPE, for fine grained IP
blocks:
Ï
it adheres to the arithmetic model
Ï
Contributions:
•
implementation of the arithmetic model
3.
Build a second synthesis tool, FPLA, for algorithmic
IP blocks:
Ï
it generates code using CGPE
Ï
Contributions:
•
trade-off implementations for matrix multiplication
•
code synthesis for Cholesky decomposition and
triangular matrix inversion
Arithmetic
model
F
ix
ed
-po
int synthe
sis
to
ol
CGPE
Our road-map
How to generate certified fixed-point code for matrix inversion?
1.
Specify an arithmetic model
Ï
Contributions:
•
formalization of p and
/
2.
Build a synthesis tool, CGPE, for fine grained IP
blocks:
Ï
it adheres to the arithmetic model
Ï
Contributions:
•
implementation of the arithmetic model
3.
Build a second synthesis tool, FPLA, for algorithmic
IP blocks:
Ï
it generates code using CGPE
Ï
Contributions:
•
trade-off implementations for matrix multiplication
•
code synthesis for Cholesky decomposition and
triangular matrix inversion
Arithmetic
model
F
ix
ed
-po
int synthe
sis
to
ol
CGPE
Alg
orithmic level to
ol
FPLA
Outline of the talk
1. An arithmetic model for fixed-point code synthesis
2. An implementation of the arithmetic model: the CGPE tool
An arithmetic model for fixed-point code synthesis
Outline of the talk
1. An arithmetic model for fixed-point code synthesis
2. An implementation of the arithmetic model: the CGPE tool
3. Fixed-point code synthesis for linear algebra basic blocks
An arithmetic model for fixed-point code synthesis
Fixed-point arithmetic numbers
A fixed-point number x is defined by two integers:
.
X
the k-bit integer representation of x
.
f
the
implicit
scaling factor of x
The value of x is given by x =
2
X
f
=
k
−1−f
X
`=−f
X
`+f
· 2
`
X
7
X
6
X
5
X
4
X
3
X
2
X
1
X
0
k
= 8
i
= 3
f
= 5
Notation
A fixed-point number with
i
bits
of integer part and
f
bits
of fraction part is in the Q
i
.fformat
Example:
Ï
x
in Q
3
.5
and X =
(1001 1000)
2
=
(152)
10
−→
x
=
(100.11000)
2
=
(4.75)
10
An arithmetic model for fixed-point code synthesis
Fixed-point arithmetic numbers
A fixed-point number x is defined by two integers:
.
X
the k-bit integer representation of x
.
f
the
implicit
scaling factor of x
The value of x is given by x =
2
X
f
=
k
−1−f
X
`=−f
X
`+f
· 2
`
X
7
X
6
X
5
X
4
X
3
X
2
X
1
X
0
k
= 8
i
= 3
f
= 5
Notation
A fixed-point number with
i
bits
of integer part and
f
bits
of fraction part is in the Q
i
.fformat
Example:
Ï
x
in Q
3
.5
and X =
(1001 1000)
2
=
(152)
10
−→
x
=
(100.11000)
2
=
(4.75)
10
How to compute with fixed-point numbers?
An arithmetic model for fixed-point code synthesis
Fixed-point arithmetic numbers
A fixed-point number x is defined by two integers:
.
X
the k-bit integer representation of x
.
f
the
implicit
scaling factor of x
The value of x is given by x =
2
X
f
=
k
−1−f
X
`=−f
X
`+f
· 2
`
1
0
0
1
1
0
0
0
k
= 8
i
= 3
f
= 5
Notation
A fixed-point number with
i
bits
of integer part and
f
bits
of fraction part is in the Q
i
.fformat
Example:
Ï
x
in Q
3
.5
and X =
(1001 1000)
2
=
(152)
10
−→
x
=
(100.11000)
2
=
(4.75)
10
An arithmetic model for fixed-point code synthesis
Fixed-point arithmetic numbers
A fixed-point number x is defined by two integers:
.
X
the k-bit integer representation of x
.
f
the
implicit
scaling factor of x
The value of x is given by x =
2
X
f
=
k
−1−f
X
`=−f
X
`+f
· 2
`
1
0
0
1
1
0
0
0
k
= 8
i
= 3
f
= 5
Notation
A fixed-point number with
i
bits
of integer part and
f
bits
of fraction part is in the Q
i
.fformat
Example:
Ï
x
in Q
3
.5
and X =
(1001 1000)
2
=
(152)
10
−→
x
=
(100.11000)
2
=
(4.75)
10
How to compute with fixed-point numbers?
An arithmetic model for fixed-point code synthesis
An interval arithmetic based model
For each coefficient or variable v, we keep track of 2 intervals Val
(
v
)
and Err
(
v
)
Our model assumes a fixed word-length k
Val
(
v
)
is the range of v
the format Q
i
.f
of v is deduced from
Val
(
v
)
=
£v,v¤
Ï
i
=
l
log
2
(
max
(
¯
¯
v
¯
¯
,
¯
¯
v
¯
¯
))
m
+ α
Ï
f
= k − i
α =
(1, if mod ¡log
2
(
v
)
,
1¢ 6= 0,
2, otherwise
Err
(
v
)
encloses the
rounding error of
computing v
a bound
² on rounding
errors is deduced from
Err
(
v
)
=
£e,e¤
Ï
² = max¡¯¯e¯¯,¯¯e¯¯¢
¦ ¦ ¦ a0 ¦ a1 ¦ ¦ a2 ¦ a3 ¦x
¦ ¦ ¦ a4 ¦ a5¦
.
.
Val
(
v
)
= g
¦¡Val
(
v
1)
, Val
(
v
2)
, Err
(
v
1)
, Err
(
v
2)
¢
Err
(
v
)
= h
¦¡Val
(
v
1)
, Val
(
v
2)
, Err
(
v
1)
, Err
(
v
2)
¢
Val
(
v
1)
Err
(
v
1)
Val
(
v
2)
Err
(
v
2)
How to propagate Val
(
v
)
and Err
(
v
)
for ¦ ∈ ©+,−,×,¿,À,p,
/
ª
An arithmetic model for fixed-point code synthesis
An interval arithmetic based model
For each coefficient or variable v, we keep track of 2 intervals Val
(
v
)
and Err
(
v
)
Our model assumes a fixed word-length k
Val
(
v
)
is the range of v
the format Q
i
.f
of v is deduced from
Val
(
v
)
=
£v,v¤
Ï
i
=
l
log
2
(
max
(
¯
¯
v
¯
¯
,
¯
¯
v
¯
¯
))
m
+ α
Ï
f
= k − i
α =
(1, if mod ¡log
2
(
v
)
,
1¢ 6= 0,
2, otherwise
Err
(
v
)
encloses the
rounding error of
computing v
a bound
² on rounding
errors is deduced from
Err
(
v
)
=
£e,e
¤
Ï
² = max¡¯¯e¯¯,¯¯e¯¯¢
¦ ¦ ¦ a0 ¦ a1 ¦ ¦ a2 ¦ a3 ¦x
¦ ¦ ¦ a4 ¦ a5¦
.
.
Val
(
v
)
= g
¦¡Val
(
v
1)
, Val
(
v
2)
, Err
(
v
1)
, Err
(
v
2)
¢
Err
(
v
)
= h
¦¡Val
(
v
1)
, Val
(
v
2)
, Err
(
v
1)
, Err
(
v
2)
¢
Val
(
v
1)
Err
(
v
1)
Val
(
v
2)
Err
(
v
2)
How to propagate Val
(
v
)
and Err
(
v
)
for ¦ ∈ ©+,−,×,¿,À,p,
/
ª
?
An arithmetic model for fixed-point code synthesis
An interval arithmetic based model
For each coefficient or variable v, we keep track of 2 intervals Val
(
v
)
and Err
(
v
)
Our model assumes a fixed word-length k
Val
(
v
)
is the range of v
the format Q
i
.f
of v is deduced from
Val
(
v
)
=
£v,v¤
Ï
i
=
l
log
2
(
max
(
¯
¯
v
¯
¯
,
¯
¯
v
¯
¯
))
m
+ α
Ï
f
= k − i
α =
(1, if mod ¡log
2
(
v
)
,
1¢ 6= 0,
2, otherwise
Err
(
v
)
encloses the
rounding error of
computing v
a bound
² on rounding
errors is deduced from
Err
(
v
)
=
£e,e
¤
Ï
² = max¡¯¯e¯¯,¯¯e¯¯¢
¦ ¦ ¦ a0 ¦ a1 ¦ ¦ a2 ¦ a3 ¦x
¦ ¦ ¦ a4 ¦ a5¦
.
.
Val
(
v
)
= g
¦¡Val
(
v
1)
, Val
(
v
2)
, Err
(
v
1)
, Err
(
v
2)
¢
Err
(
v
)
= h
¦¡Val
(
v
1)
, Val
(
v
2)
, Err
(
v
1)
, Err
(
v
2)
¢
Val
(
v
1)
Err
(
v
1)
Val
(
v
2)
Err
(
v
2)
How to propagate Val
(
v
)
and Err
(
v
)
for ¦ ∈ ©+,−,×,¿,À,p,
/
ª
An arithmetic model for fixed-point code synthesis
An interval arithmetic based model
For each coefficient or variable v, we keep track of 2 intervals Val
(
v
)
and Err
(
v
)
Our model assumes a fixed word-length k
Val
(
v
)
is the range of v
the format Q
i
.f
of v is deduced from
Val
(
v
)
=
£v,v¤
Ï
i
=
l
log
2
(
max
(
¯
¯
v
¯
¯
,
¯
¯
v
¯
¯
))
m
+ α
Ï
f
= k − i
α =
(1, if mod ¡log
2
(
v
)
,
1¢ 6= 0,
2, otherwise
Err
(
v
)
encloses the
rounding error of
computing v
a bound
² on rounding
errors is deduced from
Err
(
v
)
=
£e,e
¤
Ï
² = max¡¯¯e¯¯,¯¯e¯¯¢
¦ ¦ ¦ a0 ¦ a1 ¦ ¦ a2 ¦ a3 ¦x
¦ ¦ ¦ a4 ¦ a5¦
.
.
Val
(
v
)
= g
¦¡Val
(
v
1)
, Val
(
v
2)
, Err
(
v
1)
, Err
(
v
2)
¢
Err
(
v
)
= h
¦¡Val
(
v
1)
, Val
(
v
2)
, Err
(
v
1)
, Err
(
v
2)
¢
Val
(
v
1)
Err
(
v
1)
Val
(
v
2)
Err
(
v
2)
How to propagate Val
(
v
)
and Err
(
v
)
for ¦ ∈ ©+,−,×,¿,À,p,
/
ª
?
An arithmetic model for fixed-point code synthesis
Fixed-point multiplication
The output format of a Q
i1
.f1
× Q
i2
.f2
is Q
i1 + i2
.f1 + f2
But, doubling the word-length is costly
i1
f1
i2
f2
i1 + i2
f1 + f2
fr
•
• • • • • • •
•
• • • • • • •
×
•
• • • • • • •
•
• • • • • • •
Discarded bits
×
. .Val
(v ) =
Val(v1)×
Val(v2)
Err(v ) =
Val(v1)×
Err(v2)
+
Val(v2)×
Err(v1)
+
Err(v1)×
Err(v2)
Val(v ) =
Val(v1)×
Val(v2)−
Err×
Err(v ) =
Err×
+
Val(v1)×
Err(v2)
+
Val(v2)×
Err(v1)
+
Err(v1)×
Err(v2)
Val(v1) Err(v1) Val(v2) Err(v2)Err
×
=
h
0,2
−f
r
− 2
−
(
f
1
+f
2
)
i
This multiplication is available on integer processors and DSPs
i n t 3 2 _ t
mul
(
i n t 3 2 _ t
v1
,
i n t 3 2 _ t
v2
) {
i n t 6 4 _ t
prod
= ((
i n t 6 4 _ t
)
v1
) * ((
i n t 6 4 _ t
)
v2
) ;
retu rn
(
i n t 3 2 _ t
) (
prod
>> 32) ;
An arithmetic model for fixed-point code synthesis
Fixed-point multiplication
The output format of a Q
i1
.f1
× Q
i2
.f2
is Q
i1 + i2
.f1 + f2
But, doubling the word-length is costly
i1
f1
i2
f2
i1 + i2
f1 + f2
fr
•
• • • • • • •
•
• • • • • • •
×
•
• • • • • • •
•
• • • • • • •
Discarded bits
×
. .Val
(v ) =
Val(v1)×
Val(v2)
Err(v ) =
Val(v1)×
Err(v2)
+
Val(v2)×
Err(v1)
+
Err(v1)×
Err(v2)
Val
(v ) =
Val(v1)×
Val(v2)−
Err×
Err(v ) =
Err×
+
Val(v1)×
Err(v2)
+
Val(v2)×
Err(v1)
+
Err(v1)×
Err(v2)
Val(v1) Err(v1) Val(v2) Err(v2)Err
×
=
h
0,2
−f
r
− 2
−
(
f
1
+f
2
)
i
This multiplication is available on integer processors and DSPs
i n t 3 2 _ t
mul
(
i n t 3 2 _ t
v1
,
i n t 3 2 _ t
v2
) {
i n t 6 4 _ t
prod
= ((
i n t 6 4 _ t
)
v1
) * ((
i n t 6 4 _ t
)
v2
) ;
retu rn
(
i n t 3 2 _ t
) (
prod
>> 32) ;
}
An arithmetic model for fixed-point code synthesis
Fixed-point multiplication
The output format of a Q
i1
.f1
× Q
i2
.f2
is Q
i1 + i2
.f1 + f2
But, doubling the word-length is costly
i1
f1
i2
f2
i1 + i2
f1 + f2
fr
•
• • • • • • •
•
• • • • • • •
×
•
• • • • • • •
•
• • • • • • •
Discarded bits
×
. .Val
(v ) =
Val(v1)×
Val(v2)
Err(v ) =
Val(v1)×
Err(v2)
+
Val(v2)×
Err(v1)
+
Err(v1)×
Err(v2)
Val
(v ) =
Val(v1)×
Val(v2)−
Err×
Err(v ) =
Err×
+
Val(v1)×
Err(v2)
+
Val(v2)×
Err(v1)
+
Err(v1)×
Err(v2)
Val(v1) Err(v1) Val(v2) Err(v2)Err
×
=
h
0,2
−f
r
− 2
−
(
f
1
+f
2
)
i
This multiplication is available on integer processors and DSPs
i n t 3 2 _ t
mul
(
i n t 3 2 _ t
v1
,
i n t 3 2 _ t
v2
) {
i n t 6 4 _ t
prod
= ((
i n t 6 4 _ t
)
v1
) * ((
i n t 6 4 _ t
)
v2
) ;
retu rn
(
i n t 3 2 _ t
) (
prod
>> 32) ;
An arithmetic model for fixed-point code synthesis
Fixed-point multiplication
The output format of a Q
i1
.f1
× Q
i2
.f2
is Q
i1 + i2
.f1 + f2
But, doubling the word-length is costly
i1
f1
i2
f2
i1 + i2
f1 + f2
fr
•
• • • • • • •
•
• • • • • • •
×
•
• • • • • • •
•
• • • • • • •
Discarded bits
×
. .Val
(v ) =
Val(v1)×
Val(v2)
Err(v ) =
Val(v1)×
Err(v2)
+
Val(v2)×
Err(v1)
+
Err(v1)×
Err(v2)
Val
(v ) =
Val(v1)×
Val(v2)−
Err×
Err(v ) =
Err×
+
Val(v1)×
Err(v2)
+
Val(v2)×
Err(v1)
+
Err(v1)×
Err(v2)
Val(v1) Err(v1) Val(v2) Err(v2)Err
×
=
h
0,2
−f
r
− 2
−
(
f
1
+f
2
)
i
This multiplication is available on integer processors and DSPs
i n t 3 2 _ t
mul
(
i n t 3 2 _ t
v1
,
i n t 3 2 _ t
v2
) {
i n t 6 4 _ t
prod
= ((
i n t 6 4 _ t
)
v1
) * ((
i n t 6 4 _ t
)
v2
) ;
retu rn
(
i n t 3 2 _ t
) (
prod
>> 32) ;
}
An arithmetic model for fixed-point code synthesis
Our new fixed-point division
The output integer part of Q
i1
.f1
/
Q
i2
.f2
may be as large as i
1
+ f
2
But, doubling the word-length is costly
How to obtain sharper a error bounds on Err
/
?
i1
f1
i2
f2
i1
f1 + k
i2
f2
•
• • • • • • •
•
• • • • • • •
/
•
• • • • • • •
0 0 0 0 0 0 0 0
•
• • • • • • •
÷
×2
k
i1 + f2
i2 + f1
•
• • • • • • •
•
• • • • • • •
i1 + f2
fr
•
• • • • • • •
•
• • • • • • •
ir
fr
•
• • •
•
• • • • • • •
•
• • •
How to decide of the output format of division?
A large integer part
3
prevents overflow
7
loose error bounds and loss of
precision
A small integer part
7
may cause overflow
3
sharp error bounds and more
An arithmetic model for fixed-point code synthesis
Our new fixed-point division
The output integer part of Q
i1
.f1
/
Q
i2
.f2
may be as large as i
1
+ f
2
But, doubling the word-length is costly
How to obtain sharper a error bounds on Err
/
?
i1
f1
i2
f2
i1
f1 + k
i2
f2
•
• • • • • • •
•
• • • • • • •
/
•
• • • • • • •
0 0 0 0 0 0 0 0
•
• • • • • • •
÷
×2
k
i1 + f2
i2 + f1
•
• • • • • • •
•
• • • • • • •
i1 + f2
fr
•
• • • • • • •
•
• • • • • • •
ir
fr
•
• • •
•
• • • • • • •
•
• • •
Err
/
=
£−2
i
2
+f
1
,2
i
2
+f
1
¤
How to decide of the output format of division?
A large integer part
3
prevents overflow
7
loose error bounds and loss of
precision
A small integer part
7
may cause overflow
3
sharp error bounds and more
accurate computations
An arithmetic model for fixed-point code synthesis
Our new fixed-point division
The output integer part of Q
i1
.f1
/
Q
i2
.f2
may be as large as i
1
+ f
2
But, doubling the word-length is costly
How to obtain sharper a error bounds on Err
/
?
i1
f1
i2
f2
i1
f1 + k
i2
f2
•
• • • • • • •
•
• • • • • • •
/
•
• • • • • • •
0 0 0 0 0 0 0 0
•
• • • • • • •
÷
×2
k
i1 + f2
i2 + f1
•
• • • • • • •
•
• • • • • • •
i1 + f2
fr
•
• • • • • • •
•
• • • • • • •
ir
fr
•
• • •
•
• • • • • • •
•
• • •
Err
/
=
£−2
f
r
,2
f
r
¤
How to decide of the output format of division?
A large integer part
3
prevents overflow
7
loose error bounds and loss of
precision
A small integer part
7
may cause overflow
3
sharp error bounds and more
An arithmetic model for fixed-point code synthesis
Our new fixed-point division
The output integer part of Q
i1
.f1
/
Q
i2
.f2
may be as large as i
1
+ f
2
But, doubling the word-length is costly
How to obtain sharper a error bounds on Err
/
?
i1
f1
i2
f2
i1
f1 + k
i2
f2
•
• • • • • • •
•
• • • • • • •
/
•
• • • • • • •
0 0 0 0 0 0 0 0
•
• • • • • • •
÷
×2
k
i1 + f2
i2 + f1
•
• • • • • • •
•
• • • • • • •
i1 + f2
fr
•
• • • • • • •
•
• • • • • • •
ir
fr
•
• • •
•
• • • • • • •
•
• • •
Err
/
=
£−2
f
r
,2
f
r
¤
©
sharper bound
§
risk of overflow at run-time
How to decide of the output format of division?
A large integer part
3
prevents overflow
7
loose error bounds and loss of
precision
A small integer part
7
may cause overflow
3
sharp error bounds and more
accurate computations
An arithmetic model for fixed-point code synthesis
Our new fixed-point division
The output integer part of Q
i1
.f1
/
Q
i2
.f2
may be as large as i
1
+ f
2
But, doubling the word-length is costly
How to obtain sharper a error bounds on Err
/
?
i1
f1
i2
f2
i1
f1 + k
i2
f2
•
• • • • • • •
•
• • • • • • •
/
•
• • • • • • •
0 0 0 0 0 0 0 0
•
• • • • • • •
÷
×2
k
i1 + f2
i2 + f1
•
• • • • • • •
•
• • • • • • •
i1 + f2
fr
•
• • • • • • •
•
• • • • • • •
ir
fr
•
• • •
•
• • • • • • •
•
• • •
Err
/
=
£−2
f
r
,2
f
r
¤
©
sharper bound
§
risk of overflow at run-time
How to decide of the output format of division?
A large integer part
3
prevents overflow
7
loose error bounds and loss of
precision
A small integer part
7
may cause overflow
3
sharp error bounds and more
An arithmetic model for fixed-point code synthesis
The propagation rule and implementation of division
Once the output format decided Q
ir
.fr
/
. .
Val
(v ) = Range(Qir
.fr ) = [−2
ir −1,2ir −1 −2fr ].
Err(v ) =
Val(v2)·
á
Err(v1)−Val(v1)·
Err(v2)
á
Val(v2)·
³
Val(v2)+
á
Err(v2)
´
+
Err/
Val(v1) Err(v1) Val(v2) Err(v2)à
Val
(
v
2
)
=
Val
(
v
1
)
â
Val
(
v
)
+ Err/
∩ Val
(
v
2
)
and â
Val
(
v
)
=
[
−2
i
r
−1
, −2
−f
r
]
∪
[
2
−f
r
,2
i
r
−1
− 2
f
r
]
Additional code to check for run-time overflows
An arithmetic model for fixed-point code synthesis
The propagation rule and implementation of division
Once the output format decided Q
ir
.fr
/
. .
Val
(v ) = Range(Qir
.fr ) = [−2
ir −1,2ir −1 −2fr ].
Err(v ) =
Val(v2)·
á
Err(v1)−Val(v1)·
Err(v2)
á
Val(v2)·
³
Val(v2)+
á
Err(v2)
´
+
Err/
Val(v1) Err(v1) Val(v2) Err(v2)à
Val
(
v
2
)
=
Val
(
v
1
)
â
Val
(
v
)
+ Err/
∩ Val
(
v
2
)
and â
Val
(
v
)
=
[
−2
i
r
−1
, −2
−f
r
]
∪
[
2
−f
r
,2
i
r
−1
− 2
f
r
]
i n t 3 2 _ t
div
(
i n t 3 2 _ t
V1
,
i n t 3 2 _ t
V2
,
u i n t 1 6 _ t
eta
)
{
i n t 6 4 _ t
t1
= ((
i n t 6 4 _ t
)
V1
) <<
eta
;
i n t 6 4 _ t
V
=
t1
/
V2
;
CGPE_ASSERT
CGPE_ASSERT
ret urn
(
i n t 3 2 _ t
)
V
;
}
An arithmetic model for fixed-point code synthesis
The propagation rule and implementation of division
Once the output format decided Q
ir
.fr
/
. .
Val
(v ) = Range(Qir
.fr ) = [−2
ir −1,2ir −1 −2fr ].
Err(v ) =
Val(v2)·
á
Err(v1)−Val(v1)·
Err(v2)
á
Val(v2)·
³
Val(v2)+
á
Err(v2)
´
+
Err/
Val(v1) Err(v1) Val(v2) Err(v2)à
Val
(
v
2
)
=
Val
(
v
1
)
â
Val
(
v
)
+ Err/
∩ Val
(
v
2
)
and â
Val
(
v
)
=
[
−2
i
r
−1
, −2
−f
r
]
∪
[
2
−f
r
,2
i
r
−1
− 2
f
r
]
i n t 3 2 _ t
div
(
i n t 3 2 _ t
V1
,
i n t 3 2 _ t
V2
,
u i n t 1 6 _ t
eta
)
{
i n t 6 4 _ t
t1
= ((
i n t 6 4 _ t
)
V1
) <<
eta
;
i n t 6 4 _ t
V
=
t1
/
V2
;
CGPE_ASSERT((((V & 0xFFFFFFFF80000000ll) == 0xFFFFFFFF80000000ll)
|| ((V & 0xFFFFFFFF80000000ll) == 0)));
ret urn
(
i n t 3 2 _ t
)
V
;
}
Additional code to check for run-time overflows
An arithmetic model for fixed-point code synthesis
The division format trade-off: case of inverting 2 × 2 matrices
Consider A =
Ã
a
b
c
d
!
with a,b,c,d ∈
[
−1, 1
]
in the format Q
2
.30
Cramer’s rule: if
∆ = ad − bc 6= 0 then A
−1
=
Ã
d
∆
−b
∆
−c
∆
∆
a
!
/
d
−
×
a
d
×
b
c
[
−1, 1
]
[
−1, 1
]
[
−1, 1
]
Q
2
.30
[
−2, 2
]
Q
3
.29
?
Q −10 .42 Q −8.40 Q−6.38 Q−4.36 Q−2.34 Q0.32 Q 2.30 Q 4.28 Q 6.26 Q 8.24 Q10 .222
−362
−312
−262
−212
−162
−11D
IVISION OUTPUT FORMATMaxim
um
exper
imental
error
0%
20%
40%
60%
80%
100%
Maximum error
An arithmetic model for fixed-point code synthesis
The division format trade-off: case of inverting 2 × 2 matrices
Consider A =
Ã
a
b
c
d
!
with a,b,c,d ∈
[
−1, 1
]
in the format Q
2
.30
Cramer’s rule: if
∆ = ad − bc 6= 0 then A
−1
=
Ã
d
∆
−b
∆
−c
∆
∆
a
!
/
d
−
×
a
d
×
b
c
[
−1, 1
]
[
−1, 1
]
[
−1, 1
]
Q
2
.30
[
−2, 2
]
Q
3
.29
?
Q −10 .42 Q −8.40 Q−6.38 Q−4.36 Q−2.34 Q0.32 Q 2.30 Q 4.28 Q 6.26 Q 8.24 Q10 .222
−362
−312
−262
−212
−162
−11D
IVISION OUTPUT FORMATMaxim
um
exper
imental
error
0%
20%
40%
60%
80%
100%
Maximum error
An arithmetic model for fixed-point code synthesis
The division format trade-off: case of inverting 2 × 2 matrices
Consider A =
Ã
a
b
c
d
!
with a,b,c,d ∈
[
−1, 1
]
in the format Q
2
.30
Cramer’s rule: if
∆ = ad − bc 6= 0 then A
−1
=
Ã
d
∆
−b
∆
−c
∆
∆
a
!
/
d
−
×
a
d
×
b
c
[
−1, 1
]
[
−1, 1
]
[
−1, 1
]
Q
2
.30
[
−2, 2
]
Q
3
.29
?
Q −10 .42 Q −8.40 Q−6.38 Q−4.36 Q−2.34 Q0.32 Q 2.30 Q 4.28 Q 6.26 Q 8.24 Q10 .222
−362
−312
−262
−212
−162
−11D
IVISION OUTPUT FORMATMaxim
um
exper
imental
error
0%
20%
40%
60%
80%
100%
Maximum error
An arithmetic model for fixed-point code synthesis
The division format trade-off: case of inverting 2 × 2 matrices
Consider A =
Ã
a
b
c
d
!
with a,b,c,d ∈
[
−1, 1
]
in the format Q
2
.30
Cramer’s rule: if
∆ = ad − bc 6= 0 then A
−1
=
Ã
d
∆
−b
∆
−c
∆
∆
a
!
/
d
−
×
a
d
×
b
c
[
−1, 1
]
[
−1, 1
]
[
−1, 1
]
Q
2
.30
[
−2, 2
]
Q
3
.29
?
Q −10 .42 Q −8.40 Q−6.38 Q−4.36 Q−2.34 Q0.32 Q 2.30 Q 4.28 Q 6.26 Q 8.24 Q10 .222
−362
−312
−262
−212
−162
−11D
IVISION OUTPUT FORMATMaxim
um
exper
imental
error
0%
20%
40%
60%
80%
100%
Maximum error
An arithmetic model for fixed-point code synthesis
The division format trade-off: case of inverting 2 × 2 matrices
Consider A =
Ã
a
b
c
d
!
with a,b,c,d ∈
[
−1, 1
]
in the format Q
2
.30
Cramer’s rule: if
∆ = ad − bc 6= 0 then A
−1
=
Ã
d
∆
−b
∆
−c
∆
∆
a
!
/
d
−
×
a
d
×
b
c
[
−1, 1
]
[
−1, 1
]
[
−1, 1
]
Q
2
.30
[
−2, 2
]
Q
3
.29
?
Q −10 .42 Q −8.40 Q−6.38 Q−4.36 Q−2.34 Q0.32 Q 2.30 Q 4.28 Q 6.26 Q 8.24 Q10 .222
−362
−312
−262
−212
−162
−11D
IVISION OUTPUT FORMATMaxim
um
exper
imental
error
0%
20%
40%
60%
80%
100%
Maximum error
An arithmetic model for fixed-point code synthesis
The division format trade-off: case of inverting 2 × 2 matrices
Consider A =
Ã
a
b
c
d
!
with a,b,c,d ∈
[
−1, 1
]
in the format Q
2
.30
Cramer’s rule: if
∆ = ad − bc 6= 0 then A
−1
=
Ã
d
∆
−b
∆
−c
∆
∆
a
!
/
d
−
×
a
d
×
b
c
[
−1, 1
]
[
−1, 1
]
[
−1, 1
]
Q
2
.30
[
−2, 2
]
Q
3
.29
?
Q −10 .42 Q −8.40 Q−6.38 Q−4.36 Q−2.34 Q0.32 Q 2.30 Q 4.28 Q 6.26 Q 8.24 Q10 .222
−362
−312
−262
−212
−162
−11D
IVISION OUTPUT FORMATMaxim
um
exper
imental
error
0%
20%
40%
60%
80%
100%
Maximum error
An arithmetic model for fixed-point code synthesis
The division format trade-off: case of inverting 2 × 2 matrices
Consider A =
Ã
a
b
c
d
!
with a,b,c,d ∈
[
−1, 1
]
in the format Q
2
.30
Cramer’s rule: if
∆ = ad − bc 6= 0 then A
−1
=
Ã
d
∆
−b
∆
−c
∆
∆
a
!
/
d
−
×
a
d
×
b
c
[
−1, 1
]
[
−1, 1
]
[
−1, 1
]
Q
2
.30
[
−2, 2
]
Q
3
.29
?
Q −10 .42 Q −8.40 Q−6.38 Q−4.36 Q−2.34 Q0.32 Q 2.30 Q 4.28 Q 6.26 Q 8.24 Q10 .222
−362
−312
−262
−212
−162
−11Maxim
um
exper
imental
error
0%
20%
40%
60%
80%
100%
Maximum error
Overflow rate
An arithmetic model for fixed-point code synthesis
The division format trade-off: case of inverting 2 × 2 matrices
Consider A =
Ã
a
b
c
d
!
with a,b,c,d ∈
[
−1, 1
]
in the format Q
2
.30
Cramer’s rule: if
∆ = ad − bc 6= 0 then A
−1
=
Ã
d
∆
−b
∆
−c
∆
∆
a
!
/
d
−
×
a
d
×
b
c
[
−1, 1
]
[
−1, 1
]
[
−1, 1
]
Q
2
.30
[
−2, 2
]
Q
3
.29
?
Q −10 .42 Q −8.40 Q−6.38 Q−4.36 Q−2.34 Q0.32 Q 2.30 Q 4.28 Q 6.26 Q 8.24 Q10 .222
−362
−312
−262
−212
−162
−11D
IVISION OUTPUT FORMATMaxim
um
exper
imental
error
0%
20%
40%
60%
80%
100%
Maximum error
Overflow rate
An arithmetic model for fixed-point code synthesis
The division format trade-off: case of inverting 2 × 2 matrices
Consider A =
Ã
a
b
c
d
!
with a,b,c,d ∈
[
−1, 1
]
in the format Q
2
.30
Cramer’s rule: if
∆ = ad − bc 6= 0 then A
−1
=
Ã
d
∆
−b
∆
−c
∆
∆
a
!
/
d
−
×
a
d
×
b
c
[
−1, 1
]
[
−1, 1
]
[
−1, 1
]
Q
2
.30
[
−2, 2
]
Q
3
.29
?
Q −10 .42 Q −8.40 Q−6.38 Q−4.36 Q−2.34 Q0.32 Q 2.30 Q 4.28 Q 6.26 Q 8.24 Q10 .222
−362
−312
−262
−212
−162
−11Maxim
um
exper
imental
error
0%
20%
40%
60%
80%
100%
Maximum error
Overflow rate
An arithmetic model for fixed-point code synthesis