HAL Id: lirmm-00153366
https://hal-lirmm.ccsd.cnrs.fr/lirmm-00153366
Submitted on 10 Jun 2007HAL is a multi-disciplinary open access
archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
Carry Prediction and Selection for Truncated
Multiplication
Romain Michard, Arnaud Tisserand, Nicolas Veyrat-Charvillon
To cite this version:
Romain Michard, Arnaud Tisserand, Nicolas Veyrat-Charvillon. Carry Prediction and Selection for Truncated Multiplication. SiPS’06: Workshop on Signal Processing Systems, Oct 2006, Banff, Canada, pp.339-344. �lirmm-00153366�
Carry
Prediction
and
Selection
for
Truncated
Multiplication
Romain Michard*, Arnaud Tisserandt and Nicolas Veyrat-Charvillon*
Arenaire project, LIP (CNRS-ENSL-INRIA-UCBL) Ecole Normale Superieure de Lyon
46 all&e d'Italie. F-69364 Lyon, France
{firstame.lastname}@ens-lyon.fr
t Arith group, LIRMM (CNRS-Univ. Montpellier II)
161 rue Ada. F-34392 Montpellier, France
arnaud.tisserand@lirmm.fr Abstract- This paper presents an error compensation method
for truncated multiplication. From two n-bit operands, the operator produces an n-bit product with small error compared to the 2n-bit exact product. The method is based on a logical computation followed by a simplification process. The filtering parameter used in thesimplification process helps to control the trade-off between hardware cost and accuracy. The proposed truncated multiplication scheme has been synthesized on an FPGA platform. It gives a better accuracy over area ratio than previous well-known schemes such as the constant correcting and variable correcting truncation schemes (CCT and VCT).
I. INTRODUCTION
In many digital signal processing applications, fixed-point arithmetic is used. In order to avoid word-size growth, op-erators with n-bit input(s) must return an n-bit result. For multiplication, the 2n-bit result of an
(n
xn)-bit
product hasto be set backto n-bitby dropping then leastsignificantbits
through a reduction scheme (usually truncation orrounding).
This is the purpose of truncated
multipliers.
Truncated multiplication is used mainly for applications such as finite-impulse response (FIR) filtering and discrete cosine transform (DCT) operations. It can also be used to
reduce the hardware cost of function evaluation [1].
This paper starts by the notations and presents the main methods used for truncated multiplication in Section II. In
this section, we introduce a simple classification of truncated
multiplication schemes. The proposed method is presented
in Section III. Ourmethod is based on carry prediction and selection. Section IV presents the error analysis and the
im-plementations results onFPGAs. Italsopresents acomparison
with some existing solutions.
II. BACKGROUND
A. Notations
Figure 1 presents the partial product array (PPA) of an
unsigned
(4
x4)-bit
multiplication (see [2] for full-widthmultiplication algorithms). The partial product xiyj is often
represented as a dot for compact notation (see Fig. 2).
We use fixed-point notation with n fractional bits (i.e.,
O°X1X2X3
...X* ) for the operands and the result. As shown inX3
x Y3
XYo X3Y1 X2Y1
X3Y2 X2Y2 X1Y2
+ X3Y3 X2Y3 XlY3 XoY3
X2 Y2 X2Yo XoY2 x1 Yi XlYo xoyl xo Yo xoyo P7 P6 P5 P4 P3 P2 P1 Po
Fig. 1. Partial productarrayofa (4x4)-bit multiplication.
Figure 2, MP represents the n -1 most significant columns of the partial product array. MP corresponds to the n bits of the final truncated result.
Wl,b
is the weight of the leastsignificant bit in the truncated result, i.e.,
Wl,b
= 2'. The least significant part of the PPA is noted LP, and we furtherdistinguish its k most significant columns as
LPmajor,
and the remaining n-k columns asLPmirior.
In some schemes,the column inLPminor with thehighest weight (theleft-most column in LPminor in Figure 2) is used. It is referred in the
following
asLPm
i)
or. We refertoa column in the PPAby
its weight, forexample MP extends from columnscoll
tocoln,
i.e. from the column where thepartial products weight
2-1
to the column where the partial products weight2-n =wllb.
Function
truncn
(x)
denotes x truncated to the n-th bit,androundn(X)
stands for x rounded to the n-th bit.A truncated multiplication scheme computes the partial products inMP, and add an errorcompensationvalue (ECV) computed as a function of LP. The result of a truncated
multiplication is noted
P=
truncn
(MP+f
(LP)).The most obvious way ofperforming atruncated
multipli-cation is firstto compute the exact 2n bits of the result, then round it to n bits. This full-width result is
PFW
=roundn
(MP
+LP).
While givingthe smallestpossible error,which is onlydue
to therounding, this method alsorequiresthe highestamount
Lp X
~~~~~computed,
te constant C iS based onthe LP part.117
FmaoTLP;4
LPmlo arecomputed,
the constanlt C is based on the,/
~~~LPio
Prt,
/ MiP .
~~~~~~~~~Dynamic
ECV withf
(LPmajor):
al:lpartia:l products
/ /
~~~~~~~~~~~from
LPmajo,
arecomputed
and used to evaluate the|/' ~~~~~~~correction due to LP.
.Dynamic
ECV withLPma
io,
+f
(LPrninor)
all par-tialproducts
fromLPmajo,
arecompu.ted,
somepartia:l
eva:lu.ate the ECV.
< > <
i
Dynamic
ECV with LP:al'lpartial produ.cts
inrLP aren k
~~~~~~~~~~computed
and used to compute the ECV.Fig.2. Thedifferenltpartsofapartial productarray. Table I presents the classification of the
previous
methods and,our method,accord.;ng:ly
to those groups.Of hard.ware
'by
computing
a'l'l thepartial produ.cts.
C. Static ECV: CSince the sum bits in the 2Tn-bit full-width
product
are not I 6,teepce au fC=Sm(P setmtdall
used,
one iStempted.
to remove somelow-weight
columlns byas.mn
tha eac bi ofteipt a rbbltin LPinl ordertodilminilsh the hard.ware costofthee
multiplier..
Howeer,by dingthis th cariesin he lw-wightpar
ofof
beinrlg
one. Theprobabilities
of each carry and sum bitareEIowever~
~ ~ ~ ~
~~~~~:
hstecrlsmtelwwlbydlrL ato evaluatedusing
thelogic
properties
ofte
half-adderanLd
full-tePPA arelost,
tereby introducing
an evaluation error. adrclsTwo indsoferor ccurin runcted ultilicaion-the Thedirect-trunlcated
multiplication
scheme describedprevi-evaluation error
E,aCl,
which iS due to the columns that areously
also fits inthis
categorywith
C=O,
LPiS
notcomputed
removed inLP,
and the truncatio:n errorEt,r,n,,
whichoccurs no aprxmae.whenl the
computed
valu.e of the PPA iS reduced. to anl n-bitva:lu.e. D.
Staltic
ECV:
LPmajo,
+CA d.irect-tru.ncated.
mu.Itiplier
computesolnly
the n- Imost
siglnificanlt
co:lu.mns of the VPPA. Whileminlimizilng
the Truncated.multip:licationl
schemes with a static ECVap-required
amoulntofhardware,
thisapproach
does nrlottalk-e inlto proximate the error doneby leavinlg
out the'low-weight
account
an:y
of the carriespropagating
fromLP,
anLd leads columnsLPmino,
with a co:ntstanlt, which iscomputed
eitherto amaximal evaluationrlerror. The result ofadirect-trunrlcated
by
exhaustive search o:n thei:nput
values or as a statisticalmu.ltiplier
isevaluatio:nL
of theexpected
value ofLPmin,,r I:nL
order toas anl extenlsionl of M/P. The
resu.ltinlg
(n
+k)-'bit
va:lue is B.Clazssificaltion?
of
the TrPuncaltedMultiplicaltion
Schemesteln rounlded
ortrunrlcated
to nz bits.Iln
[3]
everyilnpu.t
'bit is assu.med. to have aprobability
The tru:nLcated
multiplicationrl problem
is the trade-off be- 2tweeaccracandhardarecost Atone ide,thee isthe of
beinlg
onle. Eachpartial
product
therefore has alnexpected,
twee:nL
~~
~ ~ ~ ~ ~ ~ ~ ~~Vau
acurcByL
adrdwarthei.
exece valuesside,LFhereisthim
full-width
multiplier
with the best accuracy but thehighest
vle4-
yadgterepctdvle vrL 'cost. Atthe other
side,
there iSthe direct-truticatedmu.ltiplier
gt h xetdvleo h vlainerrwith the lowest cost but the worst accuracy.
VM[any
truncated nl-k-Itrade-offs. 4 i=o
Wepropose a
simple
classificationr basedonthecomplexity
Th mutpictore.ts
of te ECV
computationl
scheme. Wedisti:nguish
two mainrLkinlds of solutionls staltic ECV and. d
ynamic
ECV. Static ECV P =r1roud1 (M7AP_LTPmajo__rounn+(-EI_Tvahl)meansthat the correctionl value doesnot
depenld
onlthe actualva'lu.es of the
operand.s
(the
va'lu.e is fixed, atd.esign
time).
anldtheparamreter
k is chosenL so as togive
to the evaluationLDynrlamic
ECV uses a correctionrl valuecomputed usi:nLg
the error a varianLce lower thanl the variancew2
Isb/12
of theStaticECV DynamicECV
c LLPmajor+ C f(LPmajor) LLPmajor+f(LPminor) 1 LP direct-truncated [3] [4] [5] full-width
[6] [7] [8] [9], [10]
[11] ourmethod TABLE I
CLASSIFICATIONOF THEECVMETHODS USED IN LITERATURE.
one. This gives:
Etrunc
Wlsb:2
2
i=-k
Wlsb(1 2
vice versa. The truncation error
Etrunc
is thesame as the onedefined in the CCTmultiplication.
The multiplication result is:
2-k).
The multiplication result is: P=truncn(MP +
LPmajor
+round,+k
(-Eeval- Etrunc))E. DynamicECV: f
(LPmajor)
In order to further diminish the error, some schemes have been proposed where, instead ofapproximating the value of the partialproducts in LPminor bya constant, it is expressed
as afunction of the partial products in
LPmajoor
[4] gives an ECV for a modified Booth encoded PPA, where each line of partial products in LPminor is estimated as amultiple of thecorresponding partial productin LPmajor (k = 1). This results ina data-dependentECV.
[8] presents another dynamic ECV for a modified Booth
multiplier. For every possible combination of bits in the recoded operand, acorresponding expected value of
LPminor
iscomputed bystatistic analysis, and addedtotheexactvalue ofLPmajor(k = 1). Thisgives for everypossible value of the recoded operand an approximation of the carries propagated
from LP. A carry generation circuit is then computed using
a Karnaugh map. For sizes larger than 12, the exhaustive simulation is replaced by statistical analysis.
F DynamicECV: LPmajor+f
(LPminor)
In [5], the ECV for a Baugh-Wooley array multiplier is
computed in three parts. First LPmajor is computed and summed. Then thepartialproducts in
LP(h)
minor arecomputed,
some of themare inverted, and all this is summed. Thepattern
of inversions applied to the partial products in LP(h)minor is
parametrizedby aninteger Q. The sum is noted
OQ,k. Finally
the expected value of
(LPminor-
OQ,k)
isestimated,
and added do LPmajor +OQ,k.
The best value ofQ is obtainedby exhaustive search. For n > 16, astatistical
analysis
canbeperformed.
The variable correction truncated (VCT) multiplication [9], [10] estimates thecarriespropagatedfrom
LPminor
by addingto the least column of
LPmajor
the partial products ofLP(h)
mrninor. This is equivalent to multiplying thesepartial-products by two. An immediate consequence is that the ECV is minimal when the multiplicationoperands areminimal,and
P=truncn (MP+
LPmajor
+2LP(ihor
+roundn+k
(-Etrunc))
An hybrid correction truncation (HCT) multiplication [11]
realizes a compromise between the CCT and VCT
multipli-cation by only using a percentage p of the partial products
in
LPminor
forthe ECV, and adding 1 -p of the evaluationerror
Eeval,
defined in the CCTmultiplication. The truncation errorEtrunc
is also the same as in the CCT and VCTmultiplications.
The multiplication result is:
P=
truncn
(MP+LPmajor
+P*2LPihor
+roundn ((p- 1) Eeval -Etrunc))
G. DynamicECV: LPmajor +LPminor
Only the full-widthmultiplierfits into this category: this is the casewhere all of LP is computed.
III. PROPOSED METHOD
Inthiswork, a newdata-dependenttruncatedmultiplication
scheme is introduced. Itis namedprediction-selection
correct-ingtruncated (PSCT) multiplication. It is proposed for direct non-recoded unsigned array multiplication.
In the CCT, VCT and HCT multiplication schemes, the carries propagated from LPminor are estimated, either by
statistical analysis orwith the help of the partial products in column LPm
i)
or. It is then difficult to know what kind oferror is done, and what additional terms mightbe introduced inorder to improve accuracy.
Our approach tries to address this issue
by
computing
ina first time the exact values of every carry
generated
inLPminor,
and then discarding the less probable ones. This scheme simplifiesthe computation of the ECV and lower the associated hardware cost, while keeping track of the errormade byremovingthose products.
A. Carry Prediction
Consider acompletePPA asthe oneused for the full-width
multiplication in Figure 2. Since the n -k least significant
bits of the result are discarded, the
corresponding
sum bits ofLPminor do not have to be computed. But ifLPminor is_A11P
LPmaor
LPminor
(e)
Finally
xoxiyoy,
andB(x,y)
areremoved.
and.theirc --->
~~~~carry
C(x,
y)
is added toc015s
inlLPmajor-X3YO X2Yo XIYO XOYO'
3Y1 2Y11Y1So1 We lLote LPct
joranld
Lprninor
the parts ofte
PPAX3Y3 X2Y3 X1Y3 XoY3 ofthe
prediction
process.LP
Al:l
thepartia:l
produ.cts
left inLP,nin7f0
are sumbits,
whichdo :not :need to be
computed,
sinLce every carryoriginLally
XYo X2Yo XIYO =9t: X3YO X2Y0
genlerated
inrl
LPnino,
islnowcomputed
inrl
Lpmajo,.
X2YI XlYl XOYI (C 2Y1 xiYI-tV After the carry
predi;ction
process,LPmlinor
contailnsolnly
(b)
X1Y2 XOY2 ()X1Y2 XOY2 onepartial product in each column.Assumingthat each resultXoY3 XOY3 XzOXIYOYI bit of the
mrultiplicatioln
has aprobability
I ofbeinrlg
olne,
te
expected
value ofte evaluationerrorE,atl
is lowerthanlX3YO Irl X3YO
~
2-k-1WIsb,
so thatroundnl+k(-Eevala)
= OX2Y1 _& X2Y1
Bt,7)'
(d)
X1Y2 _Il(e)
X1Y2 We canreplace
rou,ndn+k(LP)
by
LPrnajo,
XoY3 XoX1YoY1 XOY3
A(X Y) B(XY) A(X,Y)
~~PPS
=roundn
(MNP
+LPmajor)
A(x,
y)
=XoX1Y1Y2
VXOX2YoY2VX1X2YoY1=
PFB(x,
Y)
=(XOY2)
(D(XIYI)
(D(X2Yo)
At thispoint,
te truncated operator we obtain gives theC(z, y)=x:UzyOyl AB(S, Y)= x:UsyoylAsame result as the full-width
multiplication
Fig. 3. The fourstepsof carry
predictioln
forn2=4anld.k= :1. B.Carry
SelectionSo
far,
the evaluation error is lower than2-k-
IW1Sb,
butte
partial
products
inLPmC,fj,,
become verycomplicated
not
implemelnted.
ata:ll,
al:l thecarriesgenerated.
there areltost,
as n grows,and.
altarge
hard.ware
cost mayresu.lt.
fInr
ord.er
leading
to the evaluationr error described.inEq.
(1).
toredu.ce
the area requirements of thetruncated.
multip:lier,
In order to
keep
te evaluationrl error low whileremovinrlg
some carries havetobesimplified.
Forthat purpose, thelogic
u:nnecessary hardware,
onl thelogic
formulas ofthe carriesformulas
are written under theirsimplified disjunctive nrlormral
generated
inLPnino,r
havetoactually
beimplemented.
Theseform,
ItL
;itr sapidon h atclm nL/¢oexpressions
are obtained.by replacing
theft:ll-add,er
and,ha:lf- This consists inremovinlg alny conljunlctionl
w:hich
nlumber
of add.er cells inlLPninUo, by
theirrespective
flogic
definitionls: variablesexceed.s
agiven thresho:ld.
t.Assu.milng
that theinpu.t
bits have aprobability
2 ofbeinlg
onle,
aconljunrctionl
oft+IcarryFA(,,c) = ctbV ctc V
bcvariables
willhaveaprobability
2--
ofbeinrlg onle.
ApplyinLg
SUMrFA
(a,I
b, c)
= at( b(Dc tefilter,
onLe :makes anL
errorof aboutTnx2-k-t-
1WIsb,
wherecarrYHyA(a, b)
= atb m is thelnumber
ofcolnjunctiolns
that were removedinl
the SUMHA(a, b)
=aX3b process. The evaluationerror
E,afl
isupdated.
in accordancewhere ab means "aand,
b",
aVb is ";a or b" and a( tad to this value. Themultiplication
resuilt
is:
for "a exclusive-or b". PP
onn(P+L'ajr+
round+(-ea)(2
It is
possible
to express telogic
fo:rmulas of the carries Pp =ronn(AIP+Lnaonk Evl)(2through
LPnino,r
Onceimplemelnted.,
the carriesgenerated,
infLet
u.s
conlsider
thepreviou.s examp:le (:Figu.re
3)
ofanl
n-bitLPnir,o,
are knownexactly.
tru.ncated.
mu.ltiplication
with n = 4and,
k = 1. Theolnly
Figure
3 showste carrypredictio:n
steps fora 4-bit :multi-columrn
inLLpmajorcoa:s
y:***,oy A(y)adplicatioln
withl one columln inLPnajo,
(a).
C(x,
y),
whereC(x, y)
=XOXIX2YoYIY2
VXoXIX2SYOYIY
in .(b)
The firststep
computes te carriesgenerated
in theits
disjunrctive
nrormal formr.
least
signrlificant
columnofLPnino,,
c018.
Since there isI:f
nlo threshold. isimposed.,
the truticated.mu.ltip:licatioln
isonly
onepartial product
tere,nocarry canLbegenerated.
equivalelntt
to te ideal roundedmultiplication
.
(c)
The carrygenerated.
inC017
is added toC016.
This For a value oftethreshold
t = 4, bothconjunctio:ns
inMultiplication scheme Direct-truncated Ideal rounded PSCTk = 1, t= 6 PSCT k= 1, t=4 PSCT k= 1, t= 2 CCTk= 1 VCT k=1
131
7.66e-1 6.25e-2 6.25e-2 4.69e-2 3.12e-2 3.12e-2 3.44e-1 avag 7.66e-1 2.34e-1 2.34e-1 2.36e-1 2.69e-1 2.69e-1 3.59e-1 a 6.19e-1 1.68e-I 1.68e-I 1.7le-I 2.15e-1 2.15e-1 2.8le-1Relative size vs. average errorfor an 8-bit precision
evmax 3.06 5.00e-1 5.00e-1 5.62e- 1 1.06 1.06 9.37e-1 TABLEII
ERRORANALYSIS OF SEVERAL TRUNCATED MULTIPLICATION SCHEMES FORn=4
IV. RESULTS A. Mathematical Error
For each studied multiplier scheme, the absolute bias
QS3,
average absolute error Eavg, standard deviationor and absolute maximumerror£max are given in output lsb
Wl,b
22. They are computed exhaustively for n < 8, and using an extensive random sampling for larger values ofn.The mathematical data for the previous example is given in Table II. We can see that, by acting on the value of the threshold t, we realize a compromise between the full-width multiplier and the CCT multiplier.
B. Synthesis
We studied the implementation of truncated multiplica-tion schemes on FPGAs. The CAD tool used was Xilinx ISE8. Ii and the target was an FPGA of the Spartan 3 family (XC3S200) with a medium speedgrade (-5). Synthesis and place-and-route were area-oriented with a standard effort. The multipliers were implemented using LUTs (not hardware block multipliers).
The Xilinx devices are optimized for 4 up to 8-bit input functions. This allows us toperform an efficient implementa-tion of PSCT multipliers with a threshold up to t = 8. We implemented the PSCT multipliers for t running from 3 to 8. A PSCT multiplier where t = 2 is equivalent to a CCT
multiplier with the same value of the parameter k. C. Comparisons
Thecomparisons are lead with some well known truncation schemes for direct multiplication, that is the CCT [7] and VCT [9] multiplications. The full-widthmultiplierand direct-truncated multiplier are used as areference.
The comparisons were done for n = 8, 12 and 16. Our method is not yet fit for higher values of n, because of the fastgrowing computational cost of the prediction process.
Figure4shows how the different schemes behave for n = 8,
12 and 16 from to top bottom. The X-axis gives the average absolute error, which is our principal accuracy criterion. The Y-axis gives the hardware cost relatively to the full-width
multiplier. The aim is to perform a good accuracy while
minimizing the hardware cost. This corresponds to the lower left part of each graph.
Forn = 8,the CCT isoutperformedby the PSCT for t = 3,
4 and 5. One can computewith the same average accuracy as
a) 2-E 0 -o a) E 0 C) a) N U) 0 a) ,a) E 0~ N 0) Qr 1.15 1.1 1.05 0.95 0.9 0.85 0.8 0.75 1.05 0.95 0.9 0.85 0.8 0.75 0.95 0.9 0.85 0.8 0.75 0.25 0.3 0.35
AverageerrorinWIsb
0.4 0.45
Relative size vs. average errorfor a122bitprecision
X
-. ,
0.25 0.3 0.35
AverageerrorinWIsb
0.4 0.45
Relative size vs. average errorfor a166bitprecision
9.
U
0.25 0.3 0.35
AverageerrorinWIsb
CCT I PSCT,t=4
VCT X PSCT, t=5 PSCT, t=3 * PSCT, t=6 A
0.4 0.45
PSCT, 8
Fig. 4. Relative sizevs.average errorfortruncated multiplication schemes.
the CCT withsmaller PSCTmultipliers. Similarly, forn= 12,
the PSCT fort =3 requireless hardware toprovide thesame average accuracy as the CCT. For n = 16, the two schemes
are equivalent.
Tables III, IV and V show accuracy results for the
trun-catedmultiplication schemes. Ifonewants to get an average accuracy as small as possible, that is get close to 0.25, the
PSCTmultiplication has a lower hardwarecostthan the other
truncated multiplication methods.
Multiplicationscheme Ideal rounded Direct-truncated CCTk =4 VCTk =4 PSCTk =4, t= 3 PSCTk =4, t= 5 avag 2.49e- 1 1.75 2.52e-1 2.5le-1 2.50e-1 2.49e-1 a 1.46e-1 9.76e-1 1.5 le-1 1.50e-1 1.48e-1 1.47e-1 emax 5.00e-1 7.00 5.66e-1 5.66e-1 6.29e-1 5.5le-1 area 48 32 47 46 45 46 delay 10.3 8.9 10.8 11.4 10.2 11.7 TABLE III
ACCURACYRESULTS FOR8-BITTRUNCATED MULTIPLICATION SCHEMES
Multiplicationscheme Ideal rounded Direct-truncated CCTk =5 VCTk =4 PSCTk= 5, t= 3 PSCTk=5, t= 4 PSCTk= 5, t= 5 Eavg 2.49e- 1 2.73 2.5le-1 2.52e-1 2.5le-1 2.50e-1 2.49e-1 a 1.45e-1 1.24 1.48e-1 1.49e-1 1.48e-1 1.46e-1 1.46e-1 Cmax 5.OOe-1 9.00 5.7le-1 6.03e-1 6.56e-1 5.94e-1 5.78e-1 area 93 53 85 93 84 88 91 delay 12.7 11.5 12.3 14.6 12.3 14.3 14.3 TABLEIV
ACCURACY RESULTS FOR 12-BIT TRUNCATED MULTIPLICATION SCHEMES Multiplication scheme Ideal rounded Direct-truncated CCTk =6 VCTk =4 VCTk =5 PSCTk=5, t= 4 PSCTk=5, t=6 'avg 2.49e- 1 3.76 2.5le-1 2.52e-1 2.50e-1 2.52e-1 2.5le-1 a 1.45e-1 1.48 1.47e-1 1.50e-1 1.46e-1 1.50e-1 1.49e-1 Emax 5.OOe-1 12.5 5.62e-I 6.36e-I 5.59e-1 6.4le-1 6.26e-1 area 162 95 140 149 153 140 142 delay 15.2 13.2 15.0 15.3 15.3 16.0 18.2 TABLEV
ACCURACY RESULTS FOR16-BIT TRUNCATED MULTIPLICATION SCHEMES
CONCLUSION
We presented a new truncated multiplication scheme. The method first computes the logic expression of the carries
propagatedfrom
LPminor,
thenperforms simplificationswhile keeping control over the introduced error. This scheme achieves an improvement both for accuracy and hardwarerequirements over previous schemes. The proposed method has been implemented on FPGAs, it shows an area reduction for comparable accuracy on 8 and 12-bitmultipliers.
Ina nearfuture we plan toimprove the speed of our method inorder to deal with largermultipliers. We also plan to study the effects of differentgroupings of thepartial productsduring the carry prediction phase, that should lead to accuracy and hardware cost improvements.
REFERENCES
[1] E. G.Walters andM. J.Schulte, "Efficient function approximation using truncatedmultipliers andsquarers,"in Proc.ofthe 1 7th IEEE Symposium onComputer Arithmetic. IEEE Computer Society, June 2005, pp. 232-239.
[2] M. D.Ercegovacand T. Lang,DigitalArithmetic. MorganKaufmann,
2003.
[3] Y C.Lim, "Single-precisionmultiplier with reduced circuit complexity forsignal processing applications,"IEEE Transactions on Computers, vol. 41, no. 10, pp. 1333-1336, Oct. 1992.
[4] T.-B. JuangandS.-F.Hsiao,"Low-errorcarry-free fixed-width multipli-erswith low-costcompensationcircuits,"IEEETransactions onCircuits and Systems-HI:Analog andDigital SignalProcessing,vol. 52,no.6,
pp. 299-303, June 2005.
[5] L.-D. VanandC.-C. Yang,"Generalized low-error area-efficient fixed-width multipliers," IEEE Transactions on Circuits and Systems-I: RegularPapers, vol.52, no. 8, pp. 1608-1619, Aug. 2005.
[6] S. S. Kidambi,F. El-Guibaly, and A. Antoniou, "Area-efficient multi-pliers for digital signalprocessing applications,"IEEETransactions on Circuitsand Systems-II:Analog and Digital SignalProcessing,vol. 43, no.2, pp. 90-95,Feb. 1996.
[7] M. J. Schulte and E. E. Swartzlander, "Truncated multiplication with correctionconstant,"in IEEEWorkshoponVLSISignal Proccessing VI, Oct. 1993,pp. 388-396.
[8] K.-J. Cho, K.-C. Lee, J.-G. Chung, andK. K. Parhi, "Design of low-errorfixed-width modified boothmultiplier,"IEEETransactions onVery LargeScaleIntegration(VLSI) Systems, vol. 12,no.5, pp.522-531,May 2004.
[9] E. J.King andE. E. Swartzlander, "Data-dependent truncation scheme for parallel multipliers," in Proc. of 31th Asilomar Conference on Signals, Systems & Computers, vol. 2. IEEE,Nov. 1997, pp.
1178-1182.
[10] E. E. Swartzlander, "Truncatedmultiplication with approximate round-ing," in Proc. of 33th Asilomar Conference on Signals, Systems & Computers, vol.2. IEEE,Oct. 1999,pp. 1480-1483.
[11] J. E.Stine and0. M. Duverne,"Variationsontruncatedmultiplication," in Proc. EuromicroSymposiumonDigital SystemDesign(DSD). IEEE, Sept. 2003, pp. 112-119.