Systems High-Level

(1)

Bit-Width Optimizations for High-Level Synthesis of Digital Signal Processing Systems

Caaliph Andriamisaina, Bertrand Le Gal and Emmanuel Casseau

Abstract- In this paper we proposeamethodology that takes into account bit-width to optimize area and power consumption of hardware architecturesprovidedbyhighlevelsynthesistools.

The methodology is based on a bit-width analysis using information that comes from the designer. This bit-width information is propagated through a graph which models the application. The resulting annotated graph enables datapath structure optimizations for high level synthesis without increasing dramatically its processing time (complexity: O(n)).

The methodology was applied to several signal and image processing applications. Our results demonstrate the effectiveness of the approach.It can be also applied ina more general design context for sizing the data of an application knowing the input data formats and their potential correlation.

Index Terms- Data sizing, hardware design, high level synthesis, optimization

I. INTRODUCTION

M\/[

ULTIMEDIA applications such as video and image processing are often characterized by a large number of computations. Inthese datatransfer, storageandcomputation intensive applications, the data and the computation bit-width is rarely constant all over the application. This bit-width evolution requires from the circuit designer a deep knowledge of the application in order to optimize its design using correctly sized hardware resource. This analyze is not trivial for the designer. The architectural functional errorless will depend of the architecture bit-width analysis for the usage profile considered. In the same time, using correctly sized hardware resources enables area reduction and power consumption decrease, that is to say important features for embeddedsystemdesign.

The increasingly demanding requirements of digital signal processing applications like multimedia, new generations of wireless systems, etc. ledto the definition ofmore andmore complex algorithms and systems that are to be efficiently implemented with the time to market constraint. Today, the electronic system design community is mainly concerned with defining efficient System-on-a-Chip (SoC) design

Manuscript receivedMay1,2006; accepted August 13, 2006.

E. Casseau and C. Andriamisaina are with LESTER lab., CNRS FRE2734,UBSUniversity,France.

B. LeGaliswithIRISA-R2D2lab.,Rennes IUniversity,France.

methodologies in order to benefit from the high integration capabilities ofcurrent ASIC and FPGAtechnologies on the onehand, andmanagestheincreasing algorithmiccomplexity ofapplicationsonthe other hand.

To handle this complexity increase, methodologies [1]

benefit from the emerging High-Level Synthesis(HLS) tools.

High-level synthesis [2], [3] is analogous to software compilation transposed to the hardware domain. HLS tools automatetheprocessthatgenerates RTLarchitecture froman algorithmic behavior of the source specification. The provided architecture respects the designer and the system constraints and is reliable (error less) compared to a hand coded design. However usual high-level synthesis tools provide fixed bit-width datapath, i.e. over-sized architectures.

In this paper we propose a methodology that takes into accountbit-widthtooptimizeareaandpowerconsumption of hardware architectures provided by high level synthesis tools.

The paper is organizes as follow. SectionII presents related work inthat topic. SectionIIIpresents ourdesign flow. The models and the techniques used to optimize the architecture aredetailed insection IV. The resultspresented insectionV show the effectiveness of theproposed methodology.

II. RELATED WORK

Several high-level synthesis techniques have been proposed for two decades. Howeverconventionaltechniques usually center on uniform-width resources. Recently, alarge part of work has beeninvestigated for bit-width optimization inarchitecture butalittlepart inhigh-level synthesis.

The value rangepropagation through data-flow graphs in [4] is usedtodetermine the minimum number of bitsrequired for the integral part offloating point representation and for integers. Arangeanalysis is also performed in [5] and [6] to optimize the bit-width for data and operations. The Bitwise Project [6] proposes acompiler that minimizes the bit-width.

Bitwise propagates integer variable ranges backwards and forwards through data-flow graphs. This method aims to removethe unwantedmost-significant bits (MSBs).

Jason Cong and al. [7] developed a bit-aware analysis, including bit-width analysis, scheduling and binding. The used flow is composed of fours steps. First, a behavioral description in C is transformed into the Machine-SUIF intermediate representation. After compilation optimization, the bit-widthanalysis is performed as astand-alone Machine- SUIFpass.The bit-widthanalysis introducedin [6] is usedto

280

(2)

decide the minimum bit-width. In the second step, MCAS architectural synthesis system [8] performs scheduling, binding and placement. In the third step, bit-aware re-

scheduling and re-binding is performedtominimize^areacost ofFUs; and the bit-aware register allocation and binding task is performedtominimize theareacostof registers. In the last step,the corresponding datapath and controller^aregenerated.

Constantinides and al. [9] formulated the combined scheduling, binding, and wordlength selection problem^{as an} ILP, and proposedâ heuristic solution in [10]. Two kinds of graphs âre used: â sequencing graph which represents the data-dependencies and â word-length compatibility graph which represents the compatibility between operation and sized ôperators that ^can implement the operation regarding the word-length. A word-length optimization for high level synthesis of digital signal processing systems has been developed in [11] using âword-length optimization software which considers the hardwaresharingtoreduce the hardware cost and minimize the optimization time. This software inserts quantizers to a data flow graph representation, partitions the resultant graph, determines the minimum required word-length for each partitioned signal, conducts scheduling and binding using the minimum word-length information, and finally optimizes the word-lengths of functional units. In [12], the potential of precision sensitive approach for the high-level synthesis of multi-precision DFGs has been explored. Register allocation, functional unit binding and scheduling algorithms to exploit the multi- precision nature of DFG for ârea efficient implementation and an integrated methodology to exploit the interdependence of scheduling, allocation and binding have been proposed. By example, ân add-shift based hardware implementation has been preferred ^{over a} multiplier based realization.

Oneapproach based only^onhardware allocation has been developed in [13]. An allocation algorithm has been proposed to minimize the hardware waste by fragmenting operations into its^common operative kernel, which thenmay

be executed overthesamefunctional units.

In our approach, ^{we use an} annotated formal model with bit-width information and dynamic ^range values in order to extract bit-wise information to optimize the high-level synthesis process without increasing dramatically its processing time. The optimization ^step is done after the scheduling and binding taskstooptimally resize theoperators andregisters.

III. DESIGN FLOW OVERVIEW

Our methodology is based ôn two steps âs presented in Fig. 1. First, âbit-widthanalysis of the application according to input information provided by the designer is performed.

The goal of this first part is to compute lower-bound and upper-bound values of each computation and memorization which ^are implemented ^on hardware ^resources after synthesis. The methodology is based ^{on a} formal model representing the application and which ^can handle bounded data information. Once all boundcomputation performed, the

necessary bit-width to model data and to implement the operations ^are evaluated. This information is then used during the high-level synthesis^process.

.1npu raoge

..un

Algriibathm

(o sistrid t

...

Fig. 1. Analysis and synthesis flow.

The second part of the methodology relates to the high- level synthesis process. High-level synthesis is used to formally transform the application into an architecture observing a setof constraints (latency, area, power, etc.). In our approach, an architecture optimizationstage is complete in order to adapt both possible operator and register bit- width. Because of its lowcomplexity0(n), themethodology canbeappliedto currenthigh-complexity DSP applications.

In this paper, the high level synthesis tool we used is

GAUTI

which allows to synthesize applications underareal time constraint.

IV. SYNTHESIS STEPS A. Bit-width model andpropagation

Ourmethodology particularly focusesonsignal and image processing applications which are generally regular and predictable. The modeling of such applications is generally performed using data (or signal) flow graph models. It is worth noticing thatourmethodology could take into account conditional branches and hierarchyevenifthese featuresare not highlighted in this paper. This is done using a CSFG (Control andStructure Flow Graph) model defined in [14].

Anexample of graph representation is illustratedinFig. 2.

Each node (data and operations) of the model can be annotated with bit-width and bounds information. Wedefine, for each graph node, attributes representing information which is necessary for the nodes bit-width computation, propagation anduse.

Definition 1 (Lower-bound and Upper-bound)

For each node n belonging to graph G, there exists a lower-bound and an upper-bound couple

[/mjn(n),

flmax(n)jIg[9i NJ

^such ^as^fminmax(n) ^{is the} minimal/maximal value that thedata canbe worth ifn E Variable(G), orifn E Operation(G), then,

fiminmax(n)

models minimal/maximal

1 GAUTtoolisdownloadable afterafreeregistrationonLESTERweb sitehttp://web.univ-ubs.fr/gaut/

(3)

value which can be produced by operation modeled by node n.

These values allow to determine the static range ofstored or produced data for the set of graph nodes. With this information, it is possible to realize an automate bit-width analysis of the application.

According to the values off3min andfimaxbounds of graph nodes, it is also possible to determine inputs and outputs sign bit requirement. It is important to use a sign bit annotation, because of two's complement coding of negative values used inhardware architecture. According to the information about the sign the architectural implementation of operators and registers is different.

Definition 2 (Sign extension)

Each graph node n belonging to graph G has an attribute noted

v1(n)--[O,

1] such as

vf(n)

notices if node n needs or not information modeling the sign of the handled data. This attribute

vf(n)

has zero value if data range notices that the data isalways positive, and one value in opposite case.

The set of defined information makes it possible to determine the optimal bit-width associated with all graph nodes, that is to say the minimum number of bits that are necessary to store or computedata.

11mt MA (titniA^ hit B1 uit C) A n'intS (1 X )+A

rItuia.

^S

Fig. 2.Formalrepresentationmodelexample.

Definition 3 (Bit-width implementation)

For eachgraph node n, there exists an optimal bit-width implementation noted

Q(n)-*9i

suchas

9(n)

corresponds to theminimumnumberof bit thatare necessaryfor register or operator hardware implementation. The bit-width implementation

9(n)

takes into account data coding aswell as sign extension if necessary. Computing functions from

(f3min, fimax, I/)

are

given

in

(]) for positive

numbers and in(2)

for negative

numbers.

9(n)

⁼^/

l0g2(P/max(n))

J+1

(1)

9(n)

=/

log2(Max((abs(/min(n))-]), abs(/Jmax(n)))

^J+2 (2) This optimal bit-width corresponds to the data sizing constraint of hardwareimplementation from which hardware resources are physically able to implement the set of computation existingintheapplication.

Once these values are available, it is possible to extract information that will be useful for the optimization of the high-level synthesisprocess (Fig. 3).

We nowdetail the flow that allows the propagationof the inputs ranges through the graph.

inus

inomtoprvddbthdeinrTeinputs

Ii ⁿtiscatwth

Firs

input,

^repstan^the ^o bisranntaed accord them input binformationetesonprvddb thedessigner.hThe inputstio

are

presented

^{as a}

couple-shaped [9, qg].

-

Inputs

range: inthis case,the

designer specifies

for each

input,

minimal and maximal values in the worst case.

This information is

presented couple-shaped [fmnj, fAmax].

Our

methodology

allows tomerge these two ways of bit- width

expressions

fur the same

application.

The attribute standardisation stage is then

applied

to model all the informationin a sameway

as[fimn fmax].

The

inputs

bit-width information

provided by

the

designer

arethen

propagated through graph

nodes.

Definition4

(Bound propagation function)

Foreach

graph

noden

belonging

^to

graph

G, there exists a

couple of function F=[crmin, Omaxi

which allows,

for

each node type,

calculating

its lower-bound and

upper-bound functions of

available

input

values. The

function omin(ei,

^...

en)

^: 9i'-*9i allows

calculating

the lower-bound

of

^node ⁿ and the

function omax(ei,

^...

en):

^{i1*i allow}

calculating

its upper-bound.

The data range propagation, from input nodes to output nodes, is performed byalow-complexity recursive algorithm.

According to the node type, the propagation function F(n) corresponds to asimple affectation (3) or to anarithmetic or logic operations (example of addition operation) (4). The propagation function library can be improved by the designer, adding more complex function (MAC, FFT, etc.) correspondingtohardwareresources orIPblocs thedesigner wants to useinhishigh-level synthesisprocess.

In order to be able to propagate lower-bound andupper- boundthrough the setofgraph nodes, it isnecessary tohave, for each node type, a suitable couple of functions F={Gmin,

282

(4)

Gmax }. The set of these functions is composed of the composition of arithmetic and logic operations representing nodes functioning before and after hardware implementation.

F(data) =

[jmin= fJmin(ei), Umax= fimax(ei)I

(3)

F(+)

⁼

[qmin= fJmin(ei)+ fJmin(e2),

^Umax=

fimax(ei)+ fimax(e2)I(4)

Once the range interval values of each graph node are computed, the associated bit-width implementation requirement for each node then is computed as presented in (1) and (2). Each node n is then annotated using the couple [Q(n),

vf(n)]

for thehigh-level synthesisstep.

A. High-Level Synthesis Process

In this paper, the high-level synthesis tool we use is GAUT (Fig. 4). The behavioural description, specifying the behaviour of the application to implement, is described in high-level language (Corbehavioural VHDL).

The synthesis can be constrained by the designer with target technology, throughput, E/S chronology, etc. A compilation stage performs syntactic analysis, semantic analysis and code parallelizing. The compilation providesan internal representation of the algorithm using a signal flow graph model (SFG).

I...I...l I...I...

miiig.

Fig.4.Usualhigh-level synthesis flow ^I^ibiar

Thedatapath unit synthesisstarts with the selection of the operators. Then, the allocation step defines the number of eachoperator. The operations scheduling is then performed.

Binding stage consists in affecting each scheduled operation to an available operator at the considered time. After the scheduling/binding stage, hardware optimization techniques can be completed to optimize architecture in terms of registers sharing and bususages.

Like GAUT, most ofhigh level synthesis tools generate hardware architectures with fixed datapath bit-width corresponding to the highest data bit-width among the computation to be performed. Oversized architectures are thus generated. From the bit-width analysis presented previously, it is possible to size correctly the hardware operatorsandregisters of the generated architecture, that isto sayreduce theareaand decreasepowerconsumption.

Previoushigh-level synthesis methodologies that take into account data bit-width operate during the selection and

allocation steps (see section II). These approaches are NP- complete problems. In order to reduce the processing time, our approach consists in optimizing the generated architecture after thebindingstepusing bit-width information coming from the annotated representation model. The algorithm complexity is then O(n). The corresponding high- level synthesis flow is presentedinFig. 5.

14 _

......

eilplIaIwfio iOM

~~~~~...

Fig.5.High-level synthesis flow including bit-width consideration

After thescheduling/binding step, operations areboundto hardware resources (operators and registers). For each allocated sharedresource, theproposed approach consists in determining its optimal size.

Definition 5 (Signextensionrequirement)

Each operator/register composing the generated hardware architecture has a list of uses. Each use corresponds to a node in theformal representation model.

Thanks to the sign information of the nodes, the need of signed hardware resource can be evaluated. Ifone use ofa particular resource is signed then the resource has to be signed.

Definition 6(Bit-width requirement)

Each operator/register composing the generated hardware architecture has a list of uses. The minimal hardware requirement for a particular resource is the maximalbit-width requirementfoundin itslistofuses.

The above definitions allow respectively determining the sign bit and the minimal bit-width requirements for each allocatedcomponentof the architecture.

B. Register optimizations

Registers merging algorithmsareusedinordertoincrease their temporal sharing. The register optimization algorithm implemented in the GAUT tool takes into accountdata-path interconnection [15]. The features of a typical datapath generated by GAUT are presented in Fig. 6. It is based on elementary computation cells also called <<clusters >>. These clusters are composed of one operator and its associated registers which are directly interconnected to the operator.

The register optimization algorithm is based on sharing registers inside the same cluster only in order to reduce interconnections costs which can become critical during the

283

(5)

logic synthesis step if a register is placed far away from its connected operators.

Fig.6. Typical data-path

In our approach, the cost function determines when a register share is interesting and includes a bit-width difference metric.

V. EXPERIMENTS

In order to evaluate our methodology, the synthesis approach was applied to three widely used signal and image processing functions: a Sum ofAbsolute

Differences

(SAD) computation, a Finite Impulse Response (FIR) filter and a FastFourier Transform (FFT) algorithm.

A. SumofAbsolute

Difference

The SAD computation is the basic operation usedinblock matching algorithms like the Full Search or Three Step Searchalgorithm. The formula usedto computethe SAD is given in (6). The macrobloc size considered (N) is 16x16 using four levels for thetransparency.

N N

A

⁼

E, E, |l (Y' X)

^-

I2 ^(y, ^X)|

^X

Alpha(y, x) (6)

y=Ox=O

where

11(y,x)

and I2(y,x) are the pixels in (y,x) and Alpha(y,x)is thetransparency.

B. FIRFilter

We have used atransposed structure ofFIR filter. This structure consists of multiplications and additions. The following equation describes the N-tapsFIRcomputation:

y(n)

⁼

N-1 L x(i)

^x

H(n

^-

i)

i=O

(7) We have

experimented 512-taps

and

1024-taps

FIR.

C. Fast FourierTransform

FFT algorithm is used to reduce the computation complexity of Discrete Fourier Transform (DFT) which requires N operations where N is the transform size. The common used FFT algorithm has been developed by Cooley andTukey. This algorithmcanreduce DFTcomplexity from N2toNlog2N.

The following equation describes the DFT of N-points sequencex(n):

N-1

X(k) =

x(n)xWN

⁷ k= 0,1, ...,N-1 (8)

n=O

where

WN

^ee

j2zN

is called the twiddle factor.

Wehaveexperimented 64-points complexFFT.

D. Results

Two syntheses were made. The first one use the usual high level synthesis process, i.e. with a fixed bit-width datapath, and the second one use the proposed approach based on a variable bit-width datapath. For each synthesis, we represent both the datapath unit area and the datapath with its controller area. Thetargettechnologywas an FPGA and the FPGA device was a Virtex-II Pro XC2VP100.

Results were obtained using a complete design flow, i.e. high level synthesis and then synthesis of the RTL architecture with Xilinx ISE 7.1i synthesis and mapping tools (Xilinx, Inc.). Syntheses were made for various throughput constraints.

Foreach case study inputs were signed and coded with 8 bits that is a usual bit-width of analog-to-digital converter output where data come from. H(n-i) and WN were also signed and coded with 8 bits. Alpha(y,x) were positive and integer values varying from 0 to 3. Unsigned 2-bit coding wasused. ResultsareshowninFig. 7, 8, 9.

Usingour approach, the area of thecomplete architecture (datapath and its controller) decrease from 17% up to 43%

for the SAD computation, from 30% up to40% for the 512- taps FIR, from 20% up to 30% for the 1024-taps FIR and from 10%up to 14% for the64-pointsFFT.

Infact, the highest the throughput, the highest the gain is.

Actually for low throughputs there are less operators and registerstocompute the wholeapplication, i.e. theyare more shared. They thus havetohandle data which bit-widthranges from the smallestto the highestones,that isto saythey have toperform theworst case (highest bit-width). Inthesescases, theareareduction is smaller.

In theparticular case of theFFT,the areasaving is weak because of the low variations between input, internal as well asoutputbit-widths. Thiswaspreviously observed during the graph analysis, showing the interest of this one.

Powerdecrease hasnotyetbeen evaluated.

SADComputationwith 4 levels for the transparency 2500

2000

; 1500

I ¹⁰⁰⁰

500

Y91 18 320;^Z 42/ 640

Throughput (Mpixels/s)

*DatapathArea UModifiedDatapathArea

*Datapath+ControllerArea mModifiedDatapath+ControllerArea

Fig.7.Synthesis results for theSADComputation

284

(6)

512-taps FIR

12000 10000 8000 z- 6000

a

4000 2000

213 284 366

Throughput (Msamples/s)

*Datapath Area

*Datapath+ControllerArea

427 512

* Modified Datapath Area

*ModifiedDatapath+ControllerArea

synthesis is completed. Using results of the first step, an architecture optimization is performedin orderto adapt both possible operator and register bit-widths. Thanks to its low complexity O(n) the proposed methodologycanbeappliedto current high-complex DSP applications. The optimization steptook less thanonesecondtooptimize each datapath.

The data and operation bit-width analysis (firststepof the methodology) is not specifically dedicated to high-level synthesis. It canbe usedin a more generalcontextforsizing the data of any DSP application knowing the input data formats and theirpotential correlation.

In the same way, the hardwareresource optimization we propose canbe easily integrated into other existing high-level synthesis tools since it is completed after the main steps of thesynthesisprocess.

1024-tapsFIR

197 244 269 341 465

*DatapathArea U Modifieddatapatharea

*Datapath+ControllerArea U ModifiedDatapath+ControllerArea

Fig. 8. Synthesisresultsforthe512-tapsand1024-taps FIR.

64-points complexFFT

12000 10000 8000

'. 6000 4000 2000

21 37/ 433 515

*DatapathArea

*Datapath+ControllerArea

*MDbdifiedDatapath Area

*MDbdifiedDatapath+ControllerArea

Fig.9.Synthesis results for the 64-points complex lFT.

VI. CONCLUSION

In this paper, we have presented a bit-width aware synthesis design flow based on two steps. First, a bit-width analysis of the application according to input information provided by the designer is performed. Then a high-level

REFERENCES

[1] E. Casseau, B. LeGal,P.Bomel,C.J6go,S. Huet,andE. Martin. C- basedrapid prototyping for digital signal processing.Inthe Proc.of theEUSIPCO,2005.

[2] D. D.Gajski,N. D.Dutt,AllenC-H. Wu, Steve Y-L.Lin,High-Level Synthesis: IntroductiontoChip and System Design,KluwerAcademic Publishers,Boston,MA, 1992.

[3] J.P.Elliott, Understanding Behavioral Synthesis.APractical Guideto High-LevelDesign, KluwerAcademicPublishers,2000.

[4] Nayak A., HaldarM., Choudhary A., and BanerjeeP. "Precisionand error analysis of MATLAB applications duringautomated hardware synthesis for FPGAs", Proceedings of DATE, 2001,pp. 722-728.

[5] Dong-U Lee, Altaf AbdulGaffar, Ray C.C. Cheung, OskarMencer, Wayne Luk, George A. Constantinides "Accuracy Guaranteed Bit- WidthOptimization",IEEETransactions onCAD, 2006.

[6] Stephenson M.,J.Babb andAmarasingheS. "Bitwidthanalysiswith application tosilicon compilation", Proceedings ofACM SIGPLAN Conference onProgramming Language Design and Implementation, 2000, pp. 108-120.

[7] J. Cong, Y. Fan, G. Han, Y. Lin, J. Xu, Z. Zhang andX. Cheng

"Bitwidth-Aware Scheduling and Bindingin High-Level Synthesis", Proceedings of the ASP-DAC,Asiaand SouthPacific, 2005,pp.856- [8] J.861.Cong, Y. Fan, G. Han, X. Yang andZ. Zhang "Architecture and Synthesis for On-chip Multicycle Communication", IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems,2004.

[9] Constantinides G.A., CheungP.Y.K. and LukW. "Optimal datapath allocation formultiple-wordlength systems", Electronics Letters, 2000, Issue17,pp. 1508-1509.

[10] Constantinides G.A.,CheungP.Y.K.and LukW."Heuristicdatapath allocation for multiple wordlength systems", Proceedings of Date, 2001,pp.791-796.

[11] Kum Ki-Il and Sung W. "Word-length optimization for high-level synthesis of digital signal processing systems", IEEE workshop on Signal Processing Systems, 1998,pp.569-578.

[12] V. Agrawal,A. Pande, andM. Mehendale"High Level Synthesis of Multi-Precision Data Flow Graphs", Proceedings of the 14th International ConferenceonVLSIDesign, 2001,pp.411-416.

[13] Molina M.C., Mendias J.M., Hermida R. "High-level allocation to minimize internal hardware wastage", Proceedings of DATE, 2003, pp.264-269.

[14] B. Le Gal, E. Casseau, S. Huet andE. Martin. "Pipelined Memory Controllers for DSP Applications Handling Unpredictable Data Accesses",IntheProc. of VLSI, 2005,pp.268-269.

[15] C. Jego, E. Casseau,E. Martin "Real timeapplication architectural synthesisdedicatedto sub-microntechnologies",IEEEInternational ASIC/SOCConference, 2000,pp. 397-401.

285

12000

10000

8000

6000

4000

2000

Systems High-Level

Bit-Width Optimizations for High-Level Synthesis of Digital Signal Processing Systems

M\/[

GAUTI

[/mjn(n),

flmax(n)jIg[9i NJ

fiminmax(n)

v1(n)--[O,

vf(n)

vf(n)

rItuia.

Q(n)-*9i

9(n)

9(n)

(f3min, fimax, I/)

given

(]) for positive

for negative

9(n)

l0g2(P/max(n))

(1)

9(n)

log2(Max((abs(/min(n))-]), abs(/Jmax(n)))

inomtoprvddbthdeinrTeinputs

input,

presented

couple-shaped [9, qg].

Inputs

designer specifies

input,

presented couple-shaped [fmnj, fAmax].

methodology

expressions

application.

applied

as[fimn fmax].

inputs

provided by

designer

propagated through graph

(Bound propagation function)

graph

belonging

graph

couple of function F=[crmin, Omaxi

for

calculating

upper-bound functions of

input

function omin(ei,

en)

calculating

of

function omax(ei,

en):

calculating

[jmin= fJmin(ei), Umax= fimax(ei)I

F(+)

[qmin= fJmin(ei)+ fJmin(e2),

fimax(ei)+ fimax(e2)I(4)

vf(n)]

14 _

~~~~~...

Differences

Difference

A

E, E, |l (Y' X)

I2 (y, X)|

Alpha(y, x) (6)

11(y,x)

y(n)

N-1 L x(i)

H(n

i)

experimented 512-taps

1024-taps

x(n)xWN

WN

j2zN

I2 ^(y, ^X)|