• Aucun résultat trouvé

High Radix BKM algorithm with Selection by Rounding

N/A
N/A
Protected

Academic year: 2021

Partager "High Radix BKM algorithm with Selection by Rounding"

Copied!
12
0
0

Texte intégral

(1)

HAL Id: hal-02545612

https://hal.archives-ouvertes.fr/hal-02545612

Submitted on 17 Apr 2020

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

High Radix BKM algorithm with Selection by Rounding

Laurent-Stephane Didier, Fabien Rico

To cite this version:

Laurent-Stephane Didier, Fabien Rico. High Radix BKM algorithm with Selection by Rounding.

[Research Report] lip6.2002.009, LIP6. 2002. �hal-02545612�

(2)

Laurent-Stephane Didier,FabienRico

Laboratoired'Informatique de Paris6

January 21,2002

Abstract

WepresentinthispaperahighradiximplementationofBKMalgorithm. Thisisashiftandadd

CORDIC-Likealgorithm thatallows fastcomputationsofcomplexexponentialand logarithm. The

improvementlies infeweriterationsfor agivenprecision andinthereductionofthe sizeof lookup

tablesforhighradices.

Keywords: Elementaryfunction,CORDICalgorithm,ComputerArithmetic,HighRadix

1 Introduction

Severalclassofalgorithmforcomputingcomplexexponentialandlogarithmhavebeenproposed. A

rstclassiscomposedbyalgorithmsusingpolynomialapproximationofthefunctiontobecomputed

[Tan89 ,Smi89,Mul97 ,DMF00 ]. Generally,fewiterationshavetobedonebythiskindofalgorithm,

butmultiplicationsarerequired. Next,multipartitetablesmethodswhichcanbeimplementedforlow

precisionrequireveryfewarithmetic operators [SM95 ,HT95 , dDT00 ]. Ontheopposite, quadratic

convergence algorithm allow better precision, but are complex and uneasy to implement [Bre76,

BB84]. Such algorithmshavethe bestasymptoticcomplexity, butbecome really eÆcient only for

thousandsbits.Finally,shiftandaddalgorithms,whichmostfamousiscordic[Vol59 ,Wal71 ,DM96 ],

allowfairlygoodprecision(untilhundredsbits)withlowcostiterations[Mul97]. Similarlytocordic

algorithm, bkm algorithm allows the evaluation of many elementary functions[BKM94, BI99 ]. In

ordertoreducethenumberofiterationsforagivenprecision,itwasproposedforcordictousehigh

radiximplementation[ALB00a ,ALB00b ,Lew99 ].

This paperpresentsa highradix implementation of thebkm algorithm for computing complex

logarithm and exponentialwithout scaling factor. Weshow a methodfor choosing the digits and

aconvergence domainfor bkmalgorithm for highradix. A rstmethodusingtheradix =10as

beenpresentedin[IMR00 ]butthismethodspendalotofmemory. Indeed,suchanalgorithm need

anumberofdigits oftheorder of magnitudeof 2

. Aseachiteration will needto store acomplex

numberinatablebydigits,itwilluseanO(N 2

)entrytable(whereNisnalnumberofiterations)

whileCORDICalgorithmonlyuseO(N)entrytable(see[ALB00a]and[ALB00b ]).

The paperis organized as follows, we rst present briey the initial version of bkm algorithm.

Next,weproposeamodicationoftheitrationusedinthisalgorithm,designedtosolvetheprincipal

problem of high-radix bkm which is the memory cost. Thereafter, we examine the two mode o

bkm E-mode (section 3) for computing Complex exponential and L-mode (section 4) for complex

logarithm. each section present the selection digits method, the domain of convergence and an

exampleofargumentreductiontothisdomain. Finally,section5wecompareouralgorithmwiththe

otherhighradixcordicimplementationandconclude.

1.1 Notations

Letconsideracomplexnumberz. Wenoteitsrealpartz x

anditsimaginaryz y

. Thus

z=z x

+iz y

WecallGaussnumbersthecomplexnumberswhoserealandimaginarypartareintegers:

d=d x

+id y

withd x

;d y

2Z

(3)

p

pfractionaldigits. Forinstance,wenotedzc

0

thegaussnumberclosesttoz.

Wenotehzi

p

thevalueoftherealandimaginarypartsztruncated downwithpfractionaldigits.

1.2 The bkm algorithm

TheBKMalgorithmpresentedin[BKM94 ]computesthefollowingiterations:

E

n+1

= E

n (1+d

n

n

)

Ln+1 = Ln ln 1+dn n

whereisanintegerandthevaluesd

n

arechosensothat:

dn=d x

n +id

y

n

d x

n

;d y

n

2f a; a+1;:::;0;:::;a 1;ag with a=

2

+1 (1)

Inthispaper,thed

n

valuesarecalleddigitsandtheintegeristheradixofthealgorithm.

Inordertocomputeoneiteration,thevaluesln 1+dn n

arestoredinalookuptableandare

accededbyd

n

. Thesize ofthistabledirectlydependsonthenumberofpossible digits. Thus,each

iterationonlyneedsshift,additionandmultiplicationbyadigitdn.

Itisshownin[Mul97]thattheseiterationspreservethefollowing equalities:

En

E1

=

exp(L1)

exp(Ln)

and L

n L

1

=ln(E

1

) ln(E

n )

Asaconsequence,itispossibletousethisalgorithmtroughtwodierentmodes:

E-mode: if we choose dn so thatLn convergesto 0,thenEnwill convergeto E1e L

1

making

possiblethecomputationofthecomplexexponential

L-mode: similarly,ifwechoose dn sothatEnconvergesto1,thenLnwill converge toL1+

ln(E

1

)whichpermitsthecomputationofthecomplexlogarithm.

Initially this algorithm was developedin radix2, simplifying the choice of digitsdn. However,

thisalgorithmgivesonedigitoftheresultateachiteration. Forinstance,aradix2algorithmgives

onlyonebitoftheresultat eachstep. Asaconsequence,ahighradiximplementationoftheBKM

algorithmwouldneedlessiterationsforthesameprecision.

Unfortunately,threeproblemsappear: thecomplexityofthedigitselection,thenumericalinsta-

bilityoftherstiterationsandthesizeoftablesforhighradiximplementations.

It hasbeenshown in[IMR00 ] that for radix 10 the bkm algorithm looks for digits ina set of

onehundred elements. Thismeansthat eachiteration requires alookuptablehavingonehundred

entries,makingthisimplementationineÆcientfor higherradices. Therefore,wewillshowamethod

forreducingthetablessize.

2 Reducing the size of lookup tables

Inordertominimizethesetofdigits,wepropose similarlyto[Lew99 ]tospliteachiterationintwo

half-iterations. The rst half-iteration is designed to reduce the imaginary part. This iteration is

closetothecordiciteration,see[Mul97]:

En = En(1+id y

n

n

)

Ln = Ln ln 1+id y

n

n

(2)

Thesecondhalf-iteration hasto reduce thereal part. Thiscomputation is close tothe bkm itera-

tion[Mul97 ]):

En+1 = En(1+d x

n

n

)

Ln+1 = Ln ln 1+d x

n

n

(3)

Consequently,wecandeducefromtheseiterationsthatfortheE-mode(respectivelytheL-mode):

thedigitsd y

n

are choseninordertoonlyminimize

L y

n

(respectivelyE y

n ),

(4)

thedigitsd

n

are choseninordertoonlyminimizejL

n

j(respectivelyE

n ).

Inthisway,thenumberofiterationisdoubledcomparedtotheinitialalgorithm,butthenumberof

digitsisreducedbyoneorderofmagnitudebecausethedigitsarechosenfromasmallerset:

f a; a+1;;0;;a 1;ag [ f ia; i(a 1);;0;;i(a 1);iag with a=

2 +1

Inotherwords,thereareonlyO()possibledigitsinsteadofO(

2

).

Now,wewill showthatonasmallconvergencedomain,itispossibletochoosed x

n ord

y

n onlyby

roundingthevalueEn(see4.1)orLn(see3.1)whichismoreeÆcientthanchoosingatableofvalue

asin[Lew99]. Butinordertohaveafunctionalalgorithm,itisnecessarytoextendtheconvergence

domainby examining closely the two rstiterations of our algorithm. Tothis end, wewill study

separatelytheE-modeandL-mode.

3 The computation of complex exponential : E-mode

Inthismode,withregardtoL1,weconstructasequenceofd x

n ,d

y

n

suchthatthesequenceLnconverges

to 0. Thissection outlines a methodfor choosingthe digits d x

n and d

y

n

. Thischoice is critical for

theconvergenceandtheperformanceofouralgorithm. Indeed,acomplexchoicewillleadtoarapid

convergenceonawidedomainbutwillbecostlynotonlyintermofcomputingtimebutalsointerm

ofhardwareresources.

Becausethesetofpossibledigitsisnite,wecannotalwaystakethebestvalueford x

n andd

y

n and

we have toselect anapproximation which isrelatively close tothe best value. Unfortunately, this

willnot assurethe convergence ofthe algorithm for thewhole complexplane,butonly fora small

domain.

First, wepresentamethodforchoosingthedigits d x

n andd

y

n

. Next,wewill prove that,for any

value takenfromadeneddomain,aniteration willkeeptheresultinthisdomain. Thereafter,we

will show anargumentreduction fromany part ofthe complexplane tothis domain, allowing our

algorithmtoconvergeforanyinput.

3.1 Digit selection

Separatingtherealandimaginarypartinequation2and3givesthetwofollowingiterations:

L x

n

= L x

n 1

2 ln

1+d y

n 2

2n

L y

n

= L y

n

arctan d y

n

n

and L x

n+1

= L x

n

ln 1+d x

n

n

L y

n+1

= L y

n

(4)

Theexactvalueford x

n andd

y

n

thatminimiseL x

n+1 andL

y

n+1 are:

d x

n

= exp L x

n

1

n

and d y

n

=tan(L y

n )

n

(5)

Withthesedigits, L x

n+1

=L y

n+1

=0.But theseare notinteger values exceptifL x

n

=0 orL y

n

=0.

So we have to choose an integer close to these values. As L x

n and L

y

n

tend to 0, we can use an

approximationoftheexactfunctionsin(5). Indeed:

exp L x

n

1

n

'L x

n

n

and tan(L y

n )

n

'L y

n

n

(6)

Letnote

T

n

=L

n

n

and T

n

=L

n

n

(7)

Thus,d x

n andd

y

n

respectivelyshouldbeclosetoT x

n andT

y

n

.Inordertoobtaintheintegersd x

n andd

y

n ,

we shouldroundT x

n and T

y

n

to thenearestinteger. Thisexactroundingimpliesexactcomparisons

whichmaybeineÆcient,ifforinstance,T x

n andT

y

n

arecodedinaredundantnumberrepresentation.

Therefore,wewillroundthetruncationofT x

n andT

y

n

keepingonly2fractionaldigits:

d y

n

=

hT y

n i

2

0

d x

n

=

T x

n

2

0

(8)

Suchachoiceis the resultof severalapproximation and we muststudythe convergence of the

algorithm. For thispurpose, westudythe evolutionof thesequenceTn. Clearly, ifit's possible to

provethatT

n

isbounded,then,L

n

willtendto0andeachiterationwillreducetheerrorbyafactor

of.

(5)

Supposethatthesequencej T x

n

jisboundedbyavalueA. ThenjL x

n j ,jL

y

n jA

n

. Thismeansthat

eachiterationshouldreducethevalueofthesequencesL x

n andL

y

n

byafactor. Wewillshowthat

ifweuseourdigitselection method,thispropertyistrueexceptforthetworstiterations.

First,wehavetoboundthevalueofTn+1 byavaluedependingofTn. Leta=dT x

n

e,b=dT y

n e .

Wehave:

j T x

n

j a

jT y

n

j b

Withthechoiceofd x

n andd

y

n

weobtain:

T x

n

2 T

x

n

2

hT y

n i

2 T

y

n

2

and

d x

n

T x

n

0

1

2

d y

n dT

y

n c

0

1

2

then

d x

n T

x

n

1

2 +

2

jd y

n T

y

n

j

1

2 +

2

Itiswellknownthatthefunctionlnandarctanhavethefollowingproperties:

ifx2[0;

1

2 ]then

8

>

>

<

>

>

: x

x 2

2

ln(1+x) x

x x

3

3

arctan (x) x;

ifx2[ 1

2

;0]then

8

<

: x x

2

ln(1+x) x

x arctan (x) x x

3

3 :

Bydenitionofd y

n

andb,j d y

n

jb. Thersthalf-iterationcanbewritten

"

T x

n

= T x

n 1

2 ln

1+d y

n 2

2n

n

T y

n

= T y

n d

y

n +d

y

n

arctan d y

n

n

n

;

whichgivesthefollowinginequalities:

a b

2

2

n

T

x

n

a

1

2

2 b

3

3

2n

T

y

n

1

2 +

2

+ b

3

3

2n

Bydenitionofd x

n

and a(whichis aninteger), we have: a b

2

2

n

1

2 d

x

n

a. Thesecond

halfiterationis:

2

6

6

4 T

x

n+1

= T x

n d

x

n +d

x

n

ln 1+d x

n

n

n

T y

n+1

= T y

n

;

andgivesthefollowing bound:

2

1

T

x

n+1

2 +

1

+

a+ b

2

2

n

+ 1

2

2

n+1

and

2

1 b

3

3

2n+1

T

y

n+1

2 +

1

+ b

3

3

2n+1

(9)

Now,letsimplifytheboundsofTn.If10and:

(6)

ifn=2,a=2andb= then jT x

3

jandjT

3 j

2 +1,

ifn3,a=b=

2

+1 then jT x

n+1 jand

T y

n+1

2 +1.

Byinduction,thisprovesthatthealgorithmconvergesifn2onthefollowingdomain:

PE =

1 2

2

;1+ 2

2

+i

1

; 1

(10)

3.3 Argument reduction

Becauseofthemethodforconstructingd x

n and d

y

n

this domainremainverysmall. Indeed,thebest

choiceforthesenumbers(seesection3.1)arenumbersminimizingthevalues:

T x

n

n

ln 1+d x

n

n

and

T y

n

n

arctan 1+d y

n

n

(11)

Theproposedchoice(eq.8)iseÆcientonlyif(see6):

exp L x

n

1

n

'L x

n

n

and tan(L y

n )

n

'L y

n

n

Unfortunately,thisisnottrueforthetworstiterations. Inotherwords,ifn=1orn=2thenT x

n+1

andT y

n+1

maybegreaterthan

2

+1. So,theseiterationshavetobereplaced. Severalmethodscan

beusedfortherstiterations:

Itispossibletochoosethevaluesd x

n andd

y

n

fromalookuptableindexedbythemostsignicant

bitsofT x

n andT

y

n

. Thisallowsabetterchoiceforthedigitsandwillimprovethereduction.

It is possible to doublean iteration that will also improve the reduction butwill needmore

computations.

It is possibleto change the radix of thealgorithm, computinganiteration with radix 4 for

instance. Thisisequivalenttocodethefractionalpart ofd x

n and d

y

n

with2bits. Thenumber

ofpossibledigitsincreases,butthechoiceandthereductionarebetter.

Itispossibletoenlargethesetofpossibledigitsforthenextiteration.

Itisalsopossibletosimulatethetworstiterationbyseveraliterationsofradix2bkmalgorithm.

Example: Supposethat10. Therstiterationscanbecomputedasfollow:

Fist, notewewillstartfromthefollowing domain:

D = [ln(2);2ln(2)]+i h

4

;

4 i

Reducingtothisdomainiswellknownasshownin[Mul97].Wehavetoimprovethereductionfrom

Dto thedomainPE (see 10). Thissecondreduction willonlyuse twoiterationof bkm, butthose

iterationsareperformeddierently.Thetworstiterationsarecomputedasfollow:

Fortherstiteration,d y

n andd

x

n

arechosenfromtwolookuptablesindexedbyT y

n

andrespec-

tivelyT y

n

roundedwith1bitforthefractionalpart.

Thesetofdigitsfortheseconditerationisextendedtof ;:::;g. Furthermore,thisiteration

isperformedtwice.

Moreprecisely,wedene x

(n)and y

(n)beingtwolookuptablessuchthat:

x

(n)istheintegerdminimizingthevalue

n

2

ln 1+d 1

for0nd4ln(2) e.

y

(n)istheintegerdminimizingthevalue

n

2

arctan d 1

for 2n2.

Thesetablesverify:

n

2

ln 1+ x

(n) 1

1

2

(12a)

n

2

arctan y

(n) 1

1

2

(12b)

Wecandotheiterationsgivenintable1.

(7)

Step1

d y

1

=

y

( 2hT y

1 i

2

0 )

L

1

= L

1

ln 1+id y

1

1

Step1+ 1

2 (

d x

1

=

x

( l

2 D

T x

1 E

2 k

0 )

L

2

= L

1

ln 1+d x

1

1

Step2

d y

2

=

h T y

2 i

2

0

L

2

= L

2

ln 1+id y

2

2

Step2+ 1

2 (

d x

2

= lD

T y

2 E

2 k

0

L = L

2

ln 1+d x

2

2

Table1: The4half-iterationusedbytheE-mode

ConsideringL12D,wehave:

0 L

x

1

ln(2)

4

L

y

1

4

BydenitionofT y

1

(see7),jT y

1

j<then

2hT y

1 i

2

0

2andthevalued y

1

= y

(

2hT y

1 i

2

0 )exists

inthetable.

Wedothefollowing iteration:

L x

1

= L

x

1 1

2

ln 1+(d y

1

1

) 2

(13a)

L y

1

= L

y

1

arctan d y

1

1

(13b)

Bydenitionofd y

1

,wehavejd y

1

jandwithequation(13a),iteasilygivesaboundforL x

1 :

ln(2) 1

2

ln(2) L x

1

2ln(2)

0 L

x

1

2ln(2) (14)

Letusnote:

C = L

y

1

2hT y

1 i

2

0

2

and D =

2hT y

1 i

2

0

2

arctan d y

1

1

Theequation(13b)become

L y

1

= C+D (15)

BydenitionofC andT y

1

(see7):

jCj

L

y

1

hL

y

1 i

2

+

2hL

y

1 i

2

2h L y

1 i

2

0

2

1

3

+ 1

4

jCj 1

2

(16)

Furthermore,bydenitionofDand(12b):

j Dj 1

2

(17)

Références

Documents relatifs

Nonetheless, using local SOVA with high radix orders provides an ultra high throughput solution for turbo codes since the critical path of the decoder remains in the radix-8 ACSU

Chaque fois que possible, les métastases osseuses doivent d’abord être éradiquées ou réduites (et consolidées pour les zones porteuses) par un abord chirurgical

The L 2 -Boosting algorithm with NP as a weak learner had up to 42 % greater accuracy than the linear Bayesian regression models using progeny means as response variables in the

L’importance de la fixation et de la rétention du 89 Sr dépend de l’envahissement métastatique (la rétention arrive à 88% en cas d’envahissement total du

The pipeline and parallel approaches are combined to introduce a new high speed FFT algorithm which increases resolution by using floating point calculations in

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des

Section II describes the 2-steps approach of the radix conversion, section III contains the algorithm for the exponent computation, section IV presents a novel approach of raising 5

Jérôme Spitz, Emeline Mourocq, Jean-Pierre Léauté, Jean-Claude Quéro, Vincent Ridoux. Prey selection by the common dolphin: fulfilling high energy requirements with high quality