• Aucun résultat trouvé

AVLSIDESIGNFORAN EFF ICIENTMULTIPROCESSORCACHEMEMOllY

N/A
N/A
Protected

Academic year: 2022

Partager "AVLSIDESIGNFORAN EFF ICIENTMULTIPROCESSORCACHEMEMOllY"

Copied!
173
0
0

Texte intégral

(1)
(2)
(3)
(4)
(5)

A VLSI DESIGN FOR AN EFFICIE NT MULTIP ROCESS OR CACHE MEMOllY

BY

XiaoLlIO

Athesis

submitted\.0the SchoolofGraduateStudies illp.utialfulfillmentofLIIt~rcqulrcrucut-.[Ill'

1111'clq~l'l,t'ul Muste r ofScience

(6)

1+1

Nalional litltary 01Canada

BibliotMque nalion ale (II;canada

Canadian Theses Service sevce des meses caoacteooes

The author hasgranted an irrevocablenon- exclusive licence allowing theNationalUbrary 01Canada to reproduce.loan,distributeor sell copies of hlslherthe~isbyanymeansand in any formor fonnat,maklogthisthesis available to interested persons .

The author retains ownership of the copyright in his/her thesis. Neitherthethesis nor substantialextracts fromIt maybeprinted or otherwiserecrocucecwunout hislherper- mission.

L'auteura aCCOI'deunelicence irrevocable et non exclusive permattant

a

laBiblioth eque nationaledu Canada dereproduir e,prll ter, distribuer cuvendre des copies desathese dequelquemantere etsousquelquefonne que ce soitpourmettr e des exemplairesde cettethesealadispo sitiondespersonn es jnteresaees.

L'auteur con serveIaproodetedudroitd'auleur quiprotegesathese.Nilatheseojdesexlreits subs tantielsde celte-ctnedoivent~lre Imprimesouautremenl reproduil s sanssen autorisa tion.

Canada

(7)

Abstra ct

Thisthesisproposesa.cache memory,used for a32·bitprocessor system,which consistsof four components:the Directory,LinellcpfncemcntUni~(LRU), Cache Memory,and ConlrolUnit,Ana-wayset-associaiivemappingmethodisemployed illthedirectory, TheLine Replacement Unitis based on theIcastI'CCClll/yused linere placement algorith m.Thecachememoryunithas a capacityof sk bytes, 32 bytesin each line,andit isdirectly accessible 10I,2, 3, or ,Ibytes[one word) oncebytile associatedproce ssor.Thiscachememory is designedfor a lllulLiple processorsys temas well as insingle processorsystem;awrile-lIlIYn/gllalgorithm andanupdalingalgorit h marccombined togetherto keeptheinformation illmain memoryl:ollsislcn twithtll<lt of the cacheand tomakethe muhicachcs coherent. The hitratios arc predictedto be over!l5 percent.Atwo-phase clockof 10ns is employedto pipelinethiscache,andit can turnout ares ultill

zune

dur ingread operat ions withoutIlucmisses.Thiscac he is imple mentedintoa singlechill,and is designedso that it is possible tobuild cachesystemsof variour sizes using these chips,without decreas ingthesystemepc...-d,Thiscache memoryhasbee lllaidout as a singleintegratedcircuitusing 3Micron NTCMOS technology,anditselectrical andlogical behaviorhas beensimulated.

(8)

Ack nowledg ments

First ofall,Iwould liketo express my sinceregratitudetomy advisorDr.Paul Gillardior his supervision,guidan ce,suggestions, encou ra gemen t, andpati ence, whichhowegreatlyhelpedmeto complet ethisthesis.

Iwould like tothan k Dr.WlodekZuberck,\\'110 hascontributedhissupport 11.11(1advicetOIVi\Cdtho developmentofthisthesis,nlldtheotherstaff mcmbers ofDep ar tmentof ComputerScience,who haveprovidedmewiththeirtechnical support . Especiallythesuppo rt and patienceof mywifewasof greathelp .

Iamverygrntelulto Departmentof ComputerScienceandSchoo l of Graduate Studies,Memorial Unive rsityofNewfoundland,for prcvidiug me with1\11cppor- tunltyfor graduatest udiesand flnauclal supportilltheIorm of1\fellowshipand teachingaseist.antshlpJuriuglilystudy.

(9)

To theMem o?'y ofMyFatherand to MyM()l./tl~1'

(10)

Con te nts

1INTll.ODUCTION

2 DASICS OFCAC H EMEMORY 2.1 O\'(':fvicw oftheMemoryJlicrllrChy. 2.2 TheCOIKcl'lofGn,h cf\'h-lIIory

2.3 TileBl'Isic Str uct ure of Cache Memory IU

2.'1 TheLim: SizeChoice I~

Vi A SurveyofClicheDesign IG

3 IMP LEM ENT ATI ON0 ....TilECACHE ALGORlT HM S 12

;

1.1 C....chcDt'SigllParaIllL1.crs. .. . . ••• • • ••• . 3.2 TheStructureoftheCache Memory 3.3 The Addr CSlllSpace~Iapping. . . 3.4 Tile Set-associativeMappiu&• . • • . . . 3.5 lmpleme ntatlon of theDirectory .• . . . .

3.5.1 TheLineSlot of theDirectory .

3;

' 0

3.5.2 The AddressRegister.... •.•.. .. .. .•.. •.... 'i5 3.5.3 The32-bitDecod erforSel Select ion

3.5.-1 TheLineNumbe r Gene ra tor. . , , 3.6 The LineRcplacCIll(:I1~Unit

3,rU OneLIt UCell.

.jj'

41)

62 3.6.2 One UitoftileLRU Cell.... . .. .. .•. .. ..••.. 65

(11)

4 THE MEMORYAND CONTROL UNITS 4.1 Structureofthe Memory •••••••

4.1.1 'l'he CacheMcmory Ilegiatee. 4.1.2 The256-bit Ito"Decoder 4.1.3 Thc CacheMClllory. 4.1.1 TheData BU9ControlCircuit. t2 TheSystemCOII~ roJUnit .

4.2.J ThcRcgulnrIl.cild/Wrilc Operil.lions • 4.2.2 TheUpdetcOperet icns 4.2.3 TheMissOpera ricus .

5 USEINA MUL'l'IPRO CESSORSYSTEM 5.1 The CoherenceSolutionStrate~y

'0

70 .2

'0

SII

!J.1

!J.1

1lI2

III .III

5.1.1 The Protocolsbetween theBusandtile Cach... . 117 5.1.2 Till!Protocolsbetweent heProcessorantithe(;adw III 5.2 8xlcrtlallll t,~r fal'e.

5.2.1 Tliclnlcrf<lccSigllills . ; 5.2.2 TlreTituiug Operations

5.3 ConsiderationortheS)'slc~1I1BusandMainMemory.

5.1Silllulaliuusof 1I1eCache-basedMullilltoccssor . 5.5 Testing the Cache~IPI Il(Jt)"

6 CONC LUSIONS

vi

.1~1i

.• . . .J:W

t:l~

ns

....14·1

(12)

List of Figures

A Von Neumann ComputerOrgani1.lIlioll•• A'l'ypicalMemoryHierarchy Th,~llasicCadit,Mt'mor~'Struct ure . 'l'heSt'l·associaliwMalIJlitiS. '1'111'Bind ory.

SiJJlu)aliollof thl'']'agArray. TheIlirl.'d ory'l'ag

TheTagBitandValid Bit clthe Ducctc ry. One Uitof theAddrl'~sRl!gislcr. IU Thel). l)'l)CRising-edge-tr iggered Fhp-Ilop . 11 Til:!:l2-~ itUlrectcr y Decoder 12 The16-bitDe coder.. .. . 13 Simulationoftbc32-bitDecoder

:12

35 38

<I -12

·12

"

50 H TheLineNumberGenerator.. .. .. . . ... 51

15 Layout oflilt'LineNumberGenerator 52

Hi Simulationof the Llue Number Generator .'j~

17 St ructureofthe LIlU Unit, •... .. ..., •••. HI TheQue-shotCircuit.••. • ••

lfl Structureofan LRUCell•. ••

2U AuExampleoftheI.IW Aigorillutl••,

vii

56 GO 63

(13)

21 Layoutur theLRUCell•. GO 22 OneBitof theLRU Cell

. .. . . .. . . ..

lij

23 SimulalMlu o(the LIl U LJuil

.. . . . '"

2" Structure

or

theCadleMrlIlory... 71

zs

OneBit ofthe MemoryRegister.. . 72 2G TileRc&ist.er/CoulIlcrLogicalCircui~.

7 ·'

27 LayoutoftheMemoryColumnControl . , ., ju 28 Simulation ortheMemoryColumnControl.. 17

29 Circuitof theTransferDecomposer 78

30 Tile2S6-bitMemoryDecoder

....

79

31 Simulation ofthe25G·lJitRow Ileccder•.

"

32 The MemoryAnays ~I

33 FourDilsoltheMemury.•.

.,

3i The DataDu'Centrol Circuit.. . .

8 .

35 OneIlitoftheGateCircuit

. ... . .. .. ... . .

'"

36 The GIII(' O:l1llrolLogic

""

37 l.ayoIILofthe Gat.cLogic . till

38 Sillllllil.tiunofthe Gil.lcLogic. !JU

39 TheBUI Write/ltt'ad 0IlCr.llion Control 91 40 Sil11ulationof the Writc ColltroJ... . . ... . . ....• fJ:l

oil SilllUllltiollofthe !t(!ad Control ~';I

-42 TileSill,;1efoJlowin~PulseProducer

viii

!).'j

(14)

43 TbcClockPulscCcnerator.

H ATimingIllngrarnforthe Clock PulseGenerator 45 TheCircuitof the BusUpdateWatc!Jer 46 TheUpdatc-wrltcOcncretcr H TheMi~sCircuit

48 Till.'Circuit [urtill'Ilu~ControlSigllnlCl'nl'r111ol' ,J!) TheUpdateI{C(I\leS~Clear Circuit 51! TheRead ValidCircuitforlleadMi:>.~

51 ATypical Cache-basedMultiprocessor System.

96 . ..'· 9G 99 .. . . 102 .. 102 Wi

'07 1U9

111 52 Ccnuuunicntionbetweenthe CacheandBusfora WriteOpcreuon.118 5:1 Communication be tweentheCacheandBusforIILineMiss 51 Ccnaunnication foran UpdateOperation. 55 TheProcessor Subsystem 56 PinFunctions

:;7 A TimingDiagramforRead/WriteOperations. . . 58 ATimingDiagram forRead Operationswith1\LineMiss . S!) A Tim'ngDiagramfor WriteOperationswitha LineMiss GO The N-Userl-ScrvcrBusArbitcr

61 TheArbitration Unit ...

62 The SharedMainMemoryPartitionedinto8Modules.

6:1 SimulaLions for the MultiprocessorSystem.. 6·1 TheTCRtiugCircuitforShilting-out CacheLineAlldl"l:lscs

ix

120 121

'24

127 ..],11 ....132 .133 .. . . . 136 ..137 .139 .113 I1G

(15)

List of Tables

TIn~Desi(;nTargetMissRatk>s ••.••... . ... H TheRe1enntCache-mapping-typeRalio. .. . .. .. . 15 The Truth TableRdding the -HoL Code-andtheninn yCode 54 TimeDelayfor the Directory ..• .. ••.. .•..•.. 5-1

The GateControlFunctions.•.•. ISS

(16)

1 INTRODUCTIO N

III1%5,.JOIIiIVOII Neumannmade proposalsfora.digita lelectronic computer st ructure.IIIhisproposals.thebaulclogicalstructur eofadigitalcomputersyste m hnslht~followillgchnmdt'ri st kll:

I.Hhus au inplttl1lf'diulIl,bymeans of which en essentiallynnlilllit,(~,1 11 1ll 111x!1' ofoperands orinstruc tions mayIll'entered.

2.It IlaS.~fQmgc,fromwhichoperandsoriustruct.lonsmaybe obtaine d aud lute whic h resultSmay 1m entered,in allYdesiredul'da .

:1.ILhas n ('ale ll/ l/lil/!/unit , eupahleof carrying out aritlnnctic andlogicnloper al iulISon any opcrnn dstakenfromstorage.

4.ILlias 'InOllfplIl ul Cllilllll,bymeans ofwhichnn essentiall yunlimitednumber of resultsmay~deliveredtotileusers .

5. It has acOII17'olunit,ca pable of interpreti nginstructionsobtainedIrcmmem- cry oretcre gc,andcapableofchoosing between cltematccourses of ar.tlon Oilthe lmsisofcomp u ted results,

IIIgeneral,acomputerwhichmeetsthecriter iadefinedas the\1011Neumann

#n lclnrcis orgiluiz"dussho wninFig.I, Alt houghthe componentsofthe flve pa rts ofthe[,;lSiestructureJlll<lthe technologies usedmayvarywidely,tlwfunct ionsof lilt: pMlsnmybed~'arl)'idclililiedillvirtuaflyanyIligitalcomputer,

(17)

Figure};A VOIl Neumann ComputerOrganixation Memory is the sourceofall information,da ta,andinst ruc tions .lI(J\\'ill.~1""I from thefour other parts . Thedata and instr uc tions art)storedin1111' 1III'11 10l"y(·,·lls.

eachof whichis associatedwith alocfltion,orf1ddl'C.~S.Thece-lls1''' 111)(';l1"" "'~" d byother parts of thecomputerbymeansoftheseatld rcssl's.

Themain functions ofinputandoutput,asindicatedhyr.lu-irlIalll"S.

deriveinformation fromand10deliver the results am!uth '!"illr" r1l1a lj" uI" 11...

outside wor ld.They havealsotwo subsidiaryIuuetlous,hllff{'rilljl,nlldd"la"'lIl versio n,Thebufferingfunction provides an interfacelindsYll\'llI"lHli~al.i(JII1.'1\\""'11 the processi ngpar tofthecomputer and the outside world.'!'Iw runvorxiourll W"

tion can convertthedatatype intheprocessingunit lntoforltlsI1sl',[IHlhi,I,'IIll' computersystem.

(18)

The proecsalng part ofa compurer,referred tc'istheIl r il/u net ic. /ogi c unit, implement sthevariousarithmet ic andlogic opera tio nson opera ndsobtainedfrom thenwnrory, 'I'heresult s,aftertheseopera tions, arc typicallystored back ill the memory,

The controlunit obt a ins instructions {rom thememor y,decodesthem, and, d(,pcllliing 011theirltll'llll iuS,scudstheappropria te cont ro lsigrmlstootherparts of tirecomputersotha~thc desiredoperatlcuswillbe accomplishe d .Italso makes fk~:i~i"rrsaboutWllitlnctjouunrst betakenafturreceivingthe re sult s

or

variousll:sts unII'ltaWilde by thef/l'ithmcljc·fogicUllit.The combluation of lhe arithlllci ic-Iogic uuitandcontroluuitisknown asthecClIlI'It! proccssillg Ulli t,o rprocessingdCl1lt~nl inlIwcaseormultiple processorsystems,

Untilthelas t twodecades, almost allthe electronicdigi tal compute r'systems ns!!!l thisVOII NCUIII(Il!llarchite ct ure, Even whenthe underlylug architectures of the computersystems begantocontaina.limitedamount of parallelism(such as inllle CDCliliOO,for examp le)i~was generallyconcealed from theUSCI'~ , In this 1'l:riod,theJel1l1l.11I1for higherspeed, liU'ge rstoragc,nudmerereliahlecomputer systemsWII."!rap idlyincr easi ng becauselargescalecomputationapp licationswere vi~llalil',('IL The dl!IlIi1UJwas suc hthat, despite manytechnolog ical advancesill electronics,uniprocessor systems provedto beinadequa tefo rthemost highlyCOIfl' putationnliy intensiveproblems sincethe pointhadbeenn'adl edwherecommu- uicatlou delaysbetweenswitchin g clemcnts or lutcgratcdcircuits pietyItJominan t rolein the speedof thecotnpntatlon .The refore, ne w wayshad tobe foundto meet

(19)

theserequireme nts . Thegeneralapproachis based011parallelism,implyingthat computer archit ectureswillhavetodepartIrcm the stri ctVOIINe umann con cept.

Parallelismill variousformshad alreadyappearedln computersproduceddurill!;

the1060's, and!lns pro ved to be aueffecti veapproach, III1I1i~coutcxt.,pnrul- lelismdocsnot onlymean thercpllcaricu oflogicbut also hns olllf'1"1Il1:alliugs. Fur example, a uniprocessorusing aplpcliucdinstru ctionunit amiapilldi lWdnrith- mctic unit,1\5well astheimplementatio n ofmulti ple programs executed"simul- tancoualy" ,all imply conceptsof parallelism,Thereforeparallelism in acomputer system presentl yhasth reemeanings:

1.Time interle aving

2,Resourcereplication

3,Resource sharing

Timeinterleav ingint roducesalimefactorinLotheconceptofpa rellclisru.That is,severalproces sste psare interleavedilllillie, each using apartortheSill nl ' hardwareatdifferent times. Inthiscase,itisnot necessa ry to have a replication of hard wareto increasetheperformanceofacomp utersystem. l'ipd illillgis;UJ exampleoflime-interleavin g.

Resourcere plicatio ni~thereplication oradditionofhurdwurr-units whidl

C;l.I1operatesimultnueo usly011nprohlum,therebyaLlnillilig cnlrll"IL"l.i' mI"JIVI '1 throughfIllllk a ti oll of logic, ra t herthaurelyingsolely011fas tlndividualga...sand

(20)

~maJIdimensions10 reducelogicdelayin orderto obtainhighspeed. Multiplepro- ccsscausing thesamehar dwa r e inSOIllCtime-sliceorderarc anexampleof resource sharing.

Sinceparallelismwasintroduced intocomputer architecture,various parallel computer architecturessuch as vectorprocessors, pipelines,array processors,as wpJlitSlIlultiprocessorarchitectures,have been developedandusedto hnudlelargf"' (1lHllllities ofoalasimultaneo uslyandconcurrent lywithhighper forma nce. I/O

1'1'OCL'lOSUrShaveI,el!llusedforiupulandoutput tospo-dupconunuulentlon IlCtWCCIl

theprocessing dements1\1Id cxtcmnl stomgecrusers,Thus, illgcucrnl,JJfITilllelisl1l indlltll'll1101onlyaltuultnuclty!JIlL 11150concurrencyThe formermeansthaltwoor

IIlIJr('evcnlsoc curat. thesallie limeandU,elatterHlelUlllthat twoOfmomevents

occurwithin agiV'~1liJl!l!rv.l 1 ofLime.

011th~other11<\1111, memory hasU('C1Iorganizedin diffcreutwaysinorder to obtainaccessspeedsr.olllpnli htcwit h thaLof ptocessiliKclementsand to haveD.

largercapacity.IIIgcooral. there arctwo basic approaches:ouc isto orgenlecthe metuoryns a memoryhierarchy,LII{' otheristodecompoec tile merncry intoseveral modulesshared by theproces sors ill the system.

These kinds ofcomputer urchltccturcsMe,moreor less,notstrictlyVon Neu- mannstruct ufes;Indeed , thenmlt.ipleprocessor syste msillpiU .cular havequite dilfl'rcnlcharnctcristj cs.

(21)

2 BASICS OF CAC HE MEMO RY

Throughou t thehistory ofelec troniccomput ers,wheneve rdevelo pments havetaken placein computertlyslcm."Iwhichincreaseprocessor sp eed,there isCOrTl':lllOlltlill~

pressureto havethememory maLd , tbisspeedand,at.thesametime, increaseils

capacity,Therefore,performancehuprovcmcnt ein compu tersluwcbl OCHesscclat e...1 withilllprovcmcnhin memoryc,'\l'acilyandspeed. Although both pmn."S!\I'r!lan ,1 mniumemorysysl c lIUI 11l\VcIx-CIIim provedby1I1c,uJily developingll.'chll(>I(J~il'll11" ,1 novelarchitec tur es,the re1m,"! beenaper sist en t.mislIllilchbutween1I11~speedof processors and that.

o r

ma in memo ry.ThILLis , themalnmemoryis slo wrelal ive' tn theprocessors.Thememorysystem limitsthe511('(:<1at whichinputdill'"cnub(~

deliveredtoaprocessorandtheresu lts received fromthepwn "llsor.This ethcl;tl-

calledlionNU1rl41U1~otllf:ll ed:.lie n ee there hasIJCCIla.constantneedforsle 1tcly improvementstoma.inmemo rysu bsystems forIlighoverallsystemperformance.

Approachesorinte resttowardimprovingmcmory speedandcil.lJacit yhavebeen the rollowing12,5,71:

l.Memoryhierarchiesand virtllalmemory

2. Cachc memor ies

3.Developmen t or largerandraste rmemorycllipll

-I.Memoryilltcrleavins

(22)

2.1 Overview of theMemoryHierarchy

In orderto improvetheperformanceof computersystems ,especially sing leprcces- SQr systems,there are twoappro aches to speed upamemory syst e m with a la rge capac ity, Oneis todevelop a higher speed memorysyste m withalarger capacity , theot heristoparfifionamemorysyst emintoallefficient memoryhiera rchy con- sis~i llgof ecvcra llcvcle ofeubeys tcme withvar io us51'(''<:(15andsizes12,3,5J.The first approach seemsmore st raig ht for wa rdandsimple- to havea fastone-leve l memory wilha largecepecity,Ho wever ,evenwit h imp roveme nts in te durclcgy, afastmemorysystem with11large ca pacit yis still verycxpeuslvc,so thatitis lICCCl'lSl11'ytouscslower memoryata lowercost tocreat ea memorySYSlf~11lwith a largeenoughcal' ad ty .Inorde r logive the memory subs yst em nil adequatecf- Ir-ctjvcspeed,thememory subsystemcallbe orge ulecd as11hicI"(lIddcI11IIICIlIOI ' Y sysi cm . Thiskind of memory systemcall bemat ched to bot h thespeedandsize requirementsof the high-s peed processoratrelati velylow cost.Aty picalhierarchi- calmcmcryst r uct ure isdep ictedill Fig 2. The loplevel ofthe memory hierarch y (ncar till!IlTOC("Ssor)hasthe Iast est spee d butalsothehigh Cllt cos t . Therefore,the capacityofthis levelismadesm a lle rto decreasesystemcosts.For tilelowerlevels, thespeedof thesubsys te m dec rea seswhiletile capacityincre ases . At the level011 the bottomof tilememory hiera rchy,the memory subsys tem]JOiiS CSSf$the larges t capacity, butslowr,~l.spee d,with lowest costperwordsto re d .In thismemoryhier- arehy,each levelis directlyconnectedtothehmucdiately high erle vel. Tha t is, each memor ysubsys t emcalldirectlycomuumicnte wil h theimmed iatelyhigherorlower

(23)

Figure 2:AT,)'pical MemoryHierarchy

subsystemillthehierarch y,Forexample,the(Jl'..>ceSSOIScandirectlycoruruuulcntc withLhe first-levelmemory,e.g. registerarray or cachememory;andsimilarly thefirst-levelsubsystemcallcommunicatewithth e second-levelone,a~sho wnin Fig,2,andso on. Ocncrn lly thetop-levelsubsys t em,suchascachememory, is usedto attempttobridge thespee dgap betweentheJ)fOCCS~OI'Sandthelower Icvd subsystem,while the [ower levelmemory subsystems arc employedtoenlarge 1I11' capacity

o r

the wholememory system.

2.2 TheConcept of Cache Mem ory

The conceptorcachemem orywas proposedbyWilke [190 51illitbriefarticle in which be describ edasys te mthat containedlwo kindsormuiumClllor) ':cue was conven t ional, and theetherwasuncouvcu tiouelhl g b.spcc. l uunuory ('nlll'dalthat

(24)

timesIalicmemory,llOWcalledcache memory.III19G5, the first realcache memory was implemented ontheIBM 360/85.Sincethen, use of cachememory hasra pidly IncreasedOila widerange of compu tersystems,initially on mainfra m es,then on mlnlcomputcrs,andtodayeven on microcomputers.

Cachememory,arelat ivelysmall, high speed random accessme moty Is de- signedfor transparently bridgingthespeedgap betweentheCPUandmainmemo ory,sinceit typically has a speedcompatiblewiththatoCtheCPU.Thismeans tha ta cache memoryilla cache-based system is invisibleandnot.directlyacres- sible to users orevento systemoperators.Typically,tile speed of cachememory isfiveto ten timesfasterthantha t.oCmainmemory. Usingthis kindofmemory hierarchy,the computer may seem to have a one-level memory with thecapacity orthe slow main memoryandthe speedof the cachememory(21.

Theidea oCthecache memory.sim ilar to theprimary-secondaryvirtualmemory, istoduplicatethe activeportions ofa lowerspeed memoryinahighspeed,but sma ller, memory. Onlythedatamost likelyto beneededin near futureby the CPUresideintile cache,andobsoletedata areautomatically repl aced bythe newlyrequestedda ta.In general, the speedof the cache memoryismatchedto themaximumdata. rateof theprocessorsothatthe processor can access datain thecachewitlloutdelay,wheneverthe datarequestedby the processor are found in the cache.Iftherequest eddata arenotin thecache,a cache missoccurs, anda requestis madetc the main memoryfortransferof therequestedda t a to the cache.H the datacurrent lyresides inthe mainmernorj,it is transferredto

(25)

thecacheimmedia tely.Ifitisno t,butis in the second arymemo ry,a.reques t isissued tobringthe requ estedda tafrom theba cking stor age.Th erefore,when th e requ iredreferencesto thememo rycan be capt uredby thecache,speedis not degraded. Otherwise,theperformancewill bedegradedbythetimcrequir ed to transferdata fromthemain memorytoth ecache.

The useofcache memoriesinmoderncomputer systems is based onthelo cality ofmemo ryrefere nces- bothspa t ialandtemporal(7, 9\.Spatial lo cality refers10 the propertytha.t memoryaccesses overa shorlperiodof timetend tobeclustered in space.Thistype of behaviorcanbe expected basedonthecom monknowledge oftypica l program behav ior:relat ed data items (va riables,arrays,er c.) are usually stored to get herand instruct ionsare mostl y executedsequentia lly. Temporallocal- it y referstothe propertythatrefe rences to agiven locality are typically clustered intime.This typeofbeha viorcanbe exp ected fromprogr am loopsin whic hboth data and inslructions are reused.Therefor e,use ofa cache memoryina computer systemcanminimi zethe intercon nection networktrafficbet weenthe proces sor and mainmemory andspeedupthesys temsin cetimaccessdelayofthe memory system and thefrequency of referencesto theslowerma inmemoryarehig h ly redu ced.

2.3 The Basic Structure of Cache Memory

The capacityofcachememoryis farsmaller then thatofmain memory:lhat is,theadd ressspace ofcache memoryis far smaller tha nthoaddressspaceof mainme mory,therefore cache rnetuoryrequires anaddressmap p ingmechanism

10

(26)

to translatethemain memory add resses,at ahigh speed, intothe cachememory addresswherethecopiesofdatainthemainmemory reside.Also becausethe 1I1()~~actlve portionsill themainmemoryarecopiedinthecachememory,iftill) cachememoryisfull andthe assoc iated processorneedsdatanotillthe cache 1IU'1110ry, some ofthod;ltn intheca chewill berepla ced withthonewlyreques ted datafromthemainrllclllory. ThereIllust exist au algorithmwhicht;\1I]Hl ltl i d thal thedata to he repl acedwillnot beusedin ncar[uture .Since the speedofthecad re

IIICll lll r) 'isthekeyfacto rin cadrememorydrsigu,llLi~kindof1llgo rit hlll1illls l IH! implementedinlt nrdwnre .llcnce,thebas ic struct ureof acadit'memo rysho uld hi1.VI~litle as tthroebasichardware co mpone nts :1111addressmappingmechanism,11

data r('pla cclIll'llLunit,am]storage (ortiredatain the cache.

Thebasic Innctionsofa cadre memorycall gene rallybtl describe dasfollows:

Each reference fromthe[II' OCCSSOI'toitmemory locatio nispresentedtothe cache mem ory.Tirecach eflretsearchesthedirect o ryoftheaddressmappin gmecha nism to seeif therequest eddata residein thecache memor y.Iftherequest e d dalaarc in the cache,tileda t aarc op eratedon to satisfy the processor immediatelywithout dist urbingthe maiu memo ry.Ifthe<lalaarcnot resident inthe cache,acache miss occu rswhichwillcausethe transfer of thenew data from themain memory to tirecache . The nthe requestedda ta CRn be refere ncedby the proces sor. Defore transferrin g anewlinetothe cache , somedata 11Mto beremovedfromthecache me moryto make roomfo r tilenew.Whicholddutaillthccache \ViII bediscarded isdeterminedbythedata r-placcmc utunit. Therefore,the cache-replacementde-

II

(27)

cistou directlyalredstilepcrfcrnwnccor the.-ache.Aguo"rllplilno,ucu t1I1goritlllll can makethe cachehaveasomewhathigherperformancetJUUIa badalgorith m.

Sinceacadlememoryhasa highspeedcompatihle.ith thaLorthe iWOdak,1 processor ,allthe algorithmsofacachememoryhaveLoIJcimplementedinhard- ware. The refore,thedesigaer5ofacachcmemoryha vc Loconsider not onlyhowto irnplcmcnt,iLsfunct ionsbut111:10howtoimplenentthesefuuctiouswithIJr;ldinal hardware.

Traditionally,a cachesystemisIJUillwitha elnglccache Iorbothlilit aand instruction s.This cacheis callcJ asIIunified cache ,illwhich casetheCPlJ'~

componentshavecnly cnecacheun itto refer 10forlonthinslr ud iollAall/Illat a.'1'111' associated processo rshnr<'lltilesam ecarhefor data endinstrudilllls,whichnmkl'!l moreefficientuseofalimi ted resou rceandJow{'f'Jtheaver agemissratios.Also a cac hesystemcanbeIlJlitluteLwoseparateClld_:onefordata,andlll'~oth e rfnr instructions.One ofthemajoradvantagestosl'liLLill&dataandiuerructicnsiulo twosepa rateachesisthat. conllictabet weeusimult a llCO ll:'linAltlld iourd.dll':'lau.1 data reads andwrit('SAre eliminated[9).

2.4 T he Line Size Cho ic e

The perfo rmeurcof nrus1l:olllllll l,l' n .dependsstronglyUII11l1''I'lillity oftill"(,/...ln- designandtheway illwhich ili:'limplomcu tcd.Therefore,cachedl'!iignisawry sig nifican lpartorcomputetsJAt.crn 111'Sigu.lu nnler lnlll'!lig lialIigh- p"rfllflllllIlCI' cachememory,therearcse\"r.ril.lchoicestobe made lindpa rameters 10bf~"d.

12

(28)

Designers ha ve to makeul!CisiollSabout. the algorithms(retch, placemen t , etc.], about. the bestsizes (cachesize,linesize, etc.},and about theways ofaddress IllIlJlpi ngandmaintain ingconsis tencyamongseveralcachesina multiprocessor, Design ersalsohavetomakerrndooffsinsettingthesepa ra meters ; e,g.cacheSi 7£ ,

Iiliesize,theset-associativity, nud soon.Eachoftheseparametersallcctscache performance ,choosingdifrerent paramete rspro ducesdifferentcacheperfor mance. ThecacheIlueelze isa very importa ntparametertha tstronglyaffectsth ecache performance,especiallythe cachemissratio [It}.Manysurveysofcachememory and/o rmemory hierarchyperfor mance ha vebeenmadefo rhighperforman cesys- tems. IIIthesesurveys,the cachelinesizechoice,withthe0\'C1"<111cache size, has hccnshown1.0stronglyalfed the cachemissratio,Sm iths\lgge.~tl'din [!lJthe line sizegivingtheminimum missrati ofor a givencachememory capacity.He alsoindicated that the minimumnumberofclementepe r setill order to obtain anacceptablemissrati o is 4 lo 8. BcyondS,themissra tioislikelytode crease very little. Aftera greatnum berof simulati ons ,Smith

II

I) presented practical valuesfor themissrati oi1Safunct ion ofcache sizei1.I1dliu...sizewhich arc listed in 'lsble1.TheDe8igllTargclMiss lllllios(DTMR)shownillTable Iarcpro posed for unifiedcnchca,instructioncaches, an d datacaches, respectively,TheDTMH.

providelle:ligllflrswithareferencetohnplement a varieryofnewSySLI'IlIS.Itcan he UNf'd 10 csli l11ah!lIle pcrformn uceimp actof certain designchoices. The models ol('Ill'll('1Il('llIode~furtile 1)']'r,,11t essutuo11" lIlil1 \11fetch, copy-buck cacheswitll,I

Lltl!n' pla Cl:lI Wlllalgor-ithm. TheyalsoarcIull-nssoclutlve forad dressnrapplng,

rs

(29)

CacheType: MissRatto

Unified Line Sizc:

Size32 0.717

0.5568 0.5IG 0.7532

...

128 G. 0.68G 0.488 004 0.48 0.72 128 0,674 0.467 0.35 0.33 0.428 0.686 256 0.643 0.42 0.3 0,258 0.276 0.386 512 0.596 0.39 0.27 0.216 O.Wi 0.257 1024 0.473 0.309 0.21 0.102 0.137 U.I{j!

2048 0.405 0.258 0.11 0.121 U.098 0.093 1096 0.329 0.193 0.12 0.082 0.059 0.05 8192 0.232 0.135 0.08 0.05 0.033 0.025 IG384 0.182 0.103 0.06 0.036 0.23 0.016

:mns

0.124 0.07 0." 0.024 0.014 0.OU9 Cache'I'ype:

Inst ructions

32 0.125 0.478 0.33 0.247

...

0.674 U.438 0.3 0.22'.! 0.191

128 0.61.Oj 0.397 0.27 0.\97 O.UH 0.15;

256 O•.'j!J2 0.:)13 0.25 0,\ 17 0.138 (I.12!!

512 0.562 0.348 0.23 0.159 0.119 0.108 1024 0.5U4 0.3U80.20 0.13<10.098 fl.lnH 2U48 0.391 0.234 0.15 O.O!!S 0.068 0.D.Oj1 40!J6 0.271 0.161 0.1 0.00:10.0"3 0.U:12 81!)2 0.112 0.1 0.06 0.037 0.02.1 0.11\6 1631t'1 0.148 0.085 0.05 0.02!J0.018 0.11 12 327GB 0.091 0.052 0.03 0.017 0.01 0.1107 Cachc'fype:

Data

32 0.131 0.611 0.$ 0.715

...

0.66 0.5 15 0.'15 0.-195 0.69.1

128 0.561 0.412 0.35 0.351 0.467 0.677 200 0.47 0.337 0.28 0.2n 0.326 0.156 512 0.345 0.246 0.2 O.WI 0.215 0.282 1024 0.283 0.211 0.16 0.138 0.14 0.161 2048 0.256 0.169 0.12 O.O!!" 0.083 O.OB!!

4096 0,247 0.153 0.1 0.D7 O. O ~ 1 0.048 8192 0,211 0.129 0.08 0.053 0.039 0.0:12 IG384 0.161 0.097 0.06 0.039 0.26 0.019 327GB 0.108 0.065 0." 0.0'.!5 0.017 (J.OI2

TallieI:The Design TargetMissIl.i\t ios

14

(30)

cxcue

TYPEADJUSTMENTS CacheType RatioofMissna tc na tio off..'lissRa te

toDirect~'Iapping 10FullAssociative

direct- ma pped l.00 1.51 5

two-wayset-associa tiv e 0.78 1.182

fuur -wuysut-nsscciative 0.70 LOG1

eight-waysct-nsso cintiv c U.67 1.015

Iuhnssociativc fJ.GG I.OUU

Table 2:'I'hcRelevan tCa che-mapping-typeRatlo

exceptforthosewit h..and 8 bylelinesizes,whichnrc1·way sct-nssociative. TIle (Mil emissratiois"Is~rclntcrlto the 11l11ppingmethodsllSI..'<I.There arcthreemap- pingmetho ds:direct-mapped,S-wnyset-assoc iat ive,and fu llyassociat ive.These arc descrjhedillthenext chapte r. Va luesinTable2expressthe relati ve ra t ios of missratesbnscd011boththedlrcct- utapp cd and fullassociativ ema pping meth- Oils.These cac he 1YIlCadjustmentsorig inallyarcfrom[30J. They arcLas edOil thedirect-mappedmethod.andare expanded10 beusedforthose basedonthe lull-ussocinti ve method.Sincethemissratios shownillTablcIarebasedOilthefull nsso cla ti vemod el ,illordertoestimate theact ualmissratioof othersystems,the 1l1l1dac~tli\1miss ratiocallbeobtained bymultiplyingthegivenmiss ratio found illTitbleI hy thecorrespond ing relevan tcacile-Illappin g. lype ri"lio fromcolumn 1IJl'(~lal.ck-<IRat ioolll1is.~Rtll etof'ullllsM ciativeofTable2.

15

(31)

2.5 A Surveyof Ca cheDesign

SinceIBMCorporationintr o d ucedth e firstcornmerclalc,Kbe memory illilsS.rsl"lll 360/85tobridge thespeedgapbetween theprocessoramlnmlutuotnory,roll'ions cache memorieshavebeenem ploye din differentiype.~ofCOlllplll,'r "10,\<"h;,'\""

higherper fo r mance. A numberof approachesha ve been usedford"I·,'lop ingl.igh pe rform an ce cachememories .AllhoughtheoperationofaLy pit',l!,'<tell('111<'11101".1' seems relatively sim ple in concept,implementation ofarcalis tic-: cndl!'Ilh'llIUl".I'is qui tecomplex,involvingmanyFactorswhichinfluenc e cache1l(' rfol'lIlill\l""TIll',"

factorsinvo lveinte rn alFactors such ascache caplIcit);linesir.'" ml.lr<'Ss lIlappilll!, strategy,fetch algori t hm,placementalgorit hm , replacementalgor ith m. ,IsII'dl ;1s theswap pin g algorithm,and external fac lorsorsyst emIacturs:prun'Ssol"urgnui- eaticn, hierarchicalmemor y orgnniae tio n, as wellasIheintcrconncrtiou Ill'lll"mk, suchallthe syste m bus.For su percomp ut ers,sy nchronizn t.ionis I'llI" I"<'s,'r i,, " ~ problemsince atleast two ormo re processo rsarccruhcdth-dill1.111'".\'sh'lll.Tlw!'"

fore,attempt ing toevaluat e cache pc rformeucc exa ctlyillII!"I'lllisti.. "oll'l," l" l systemisquite difficult.Wecan,however,use approxi mat e lllod<'lsI<>r ('\'alll,1!iOIl ofca chebeh a viorandperforma nce,

Cache performancecan bedescr ibed withreferenc etotwo"sp(~ds[!l]:l'i!,.[w missrateandacces s time.Thefirstaspect iscacheaccess time_.1110'till\"p''lll il'l~l forthepro cessorto gelinformationfro m or store information intoII","a d ""('~,,'I ,,·

accesstime depen dsnotonly011the designitselfbutalsoOillilt'll~·llIlOh,_c,.l"u-«!

incache design, Therefore,the effect of design changesOilm"'t-SSIiUII"is rliflilllll IG

(32)

to)Il"t'clid witho utspecifyingthecircuittechnologyIlsed.The secondaspect is the missrat io of the cache me mor y- the fract ion of allmeruoryreferences attempti ng

toll CCCSSdata whicharc notTl'sidentill thecache memory.IIIgener a l,everycache

miss makesthe processor wail untilthedesireddata. can be received.The miss ratjoill relat ed not only to howthecache uc:signalTcct.~the numberofmisses, hutalsolohowtheumchlucdesign,includinghardwareandsoft ware,affectstllC~

number ofcache references (mai nmemory references).For exam ple,thecache miss rat.iodepuuds ontheprogramlocalit yimpliedby softwareandthe amount ofiufonna tion(onc word,lwo wordsctc.) obtainedby theprocessoratitcache rclcreuce.

fo,lallYcomput ersys tems (alillostall modern supercomp uterandlarge computer systems ] have cachememories ofvariousdesigns tobridgethespeed gap between processorandmainmemoryinordertoimprovesystemperformance. This section presents a survey ofcache memoriesandtheirperformanc e inseveralty pica lcache- basedcomput ersystems.

1\high-speed cache memory was employedilltheIBMSystem370ModelIG8.

Thecac hewas availablein asize of either8kor16k byt es.The8K-bylecache memoryhadacycletime of80us[thesameasthemachinecyclelime)foraccessing -i·byle data.Ilwasorganiz ediutc64sets as a -t-wayset-associativecache.The write-throughscheme wasusedforupdatingtilemain memory. Theaverage miss fIlt,inwas"uuul 7 percent[271,andrhcmiss l'Iltiopredicticu,according lo the DTt-IIl,iJlfi.3perce nt

17

(33)

TheJDM 3033hasaG4k-bytecach e memory forbothiustructic usand dal,l wit h57necycle time. Thislar ge,higb-epecdcachememoryis oneor themain reaso nsfor thehighperfcnnanc e enba ncemenLorthe3033.This cache is org.luizlod into645Cbas a16-w1l.Yassociative cache.Theline,i7.e

or

the

mM

3033 is6·\

by tes.Also theIDril t.thro~g/lpolicyis em ploycc.lillUIC10M 3U33.IIIUli s IIplA,.·UI, thema inmem oryisdividedinto8modul es sotllaLmain melllorycantrallSrl~r,1 lineby interlea ving[51.

TheVAX-I I/i80is1\:J2·bit high-pcrfonueucelIlin:':Ol1ll'u lerlirsl iutruduced byDECin 1978.Irecache!La,>8k bytecapacityorganizedlute512 sets, twolilli'S per5C1,and Bbyt es (.1bytesperword )ineach line[51.Fortheruche memo ryor theVAX·II/7S0,adistinclio llillmade betweentI.rea dand awritemis!!.Iflh(,l1~

is areadmiss,therequiredlinehas10be retr ievedhomthemainmemoryand writt enintothedata CAche,Iftwo lines intilegiven sdlUC lull,some:;orLllflilll:

replacement str ategyhastoheemployedtodeterminewhichline isS\\',IIIPl'll with the111,."\"requiredline.TheVAX-II /i8 Ucachememo ryust.':'!II.JTmdolllrf:pltlr r lllrn t strlll('SYliS itspolicyIcrtlll(la LinsUICliue,IfthereisalIIi:<.'lCiUISt~1hyawrilA' operation,onlythe referencedlocation

o r

themainmemoryillupdated.Thisd.,la cacheusesII buffer ed UJI·ile.'hroug/ipolicy.TIH~missra tio

or

VAX-II/'18fJwus measured10 be I\bo ul13.05 percent [31J,andiLlsalsocslilllllll·tl1o11f~1:1);Imr('f'lIl by theDTM R.

To day cacheIIwmor iesheveIJIX'ILintegrat edwitl ltheircom~ pn llfl i llglIlk ro- prcreseorsun a.sing le ,·hi".giviugso-ca lledon-chillcache1II(.·IiIU1k'S.TI'l~Z8UtllIfJ

18

(34)

microprocessor producedbyZilogill1935includes a256 byte on-chipcache mcrn- erywhichis organlec d intoHllincs,16hytes each ,as afully associative cache.The maximum clockfreque ncyfor theZ80000is 25tilliZ, andwhentheZ80 000fetches Froruits cache,onlyonesysterudock cycleisrequired[28].Theleasil'Ccclillyused tille(Ln.U)replacement algorithmis used tochoosethelinetobereplacedbythe new onefromthemainme moryinthecaseof liue-missOCCUHcnce.Thewrite- tll1vllgflalgorithmis used in this cacheforitswritingstrategy.Whenthereis a misscausedby a wr itecpcrution,onlythe mainmemory is updated.Thiscache 1In.<;amiss ratio

e r

aspercentfor a no burst tra nsfer mode and12percent for1'1 u1Il'1lLtrausfue1I101ic1:.1!1].Itispredict ed tohnve,nllaIlUifiCI[-Cilc!W,n mlss ratio of

: m

pCI'centuslngtileJ)TMH.

A cuchememor yhas alsobeen appliedtothe Balance multiprocessor system in trodu cedby SequentComp uterSystemsInc.ill1988[371.This multiprocessor systemca ll poolup to thirty32-bilprocessors with a sha redmain memory. A subsystemillthissyste m is compose dof anNS32032 microprocessor, an NS32081 Ilcating-polutunit , amianNS32082pagedvirtua lmemory mana gcmentunlt,pro- duced by Nationa lSemiconductor. III add itio n,each subsystemliasan Sk-by tc two-wayset-associativecache memo ry to achievea highpe rfo rm ancewhile mini- mizingbus trallic.Inthiscache, wilha 50liScycletime, therearc 512sets,two lineseach,and 8byt C::!perline. Thewli /e _!/uvIlg lipolicyis employed tokeep all tilecopiesinthesystemconsistent.Wheneverthereis writerellllestIro m one pro- ressorin the system,thisrequest wil hthe eor respoudiug addressis sellttoupd a te

(35)

stale dataintheshared memorywhileitisbroadcasttoall tilecachesto seciflllt're arc anycopies ofthedat a to beupdate d.Ifso,the coerespo udiugcacheroutr oller invalidates the affectedhue.Themissrat ioofasingle-threadcachememory is IS percen t[31],while the predict edmissra tioIron,the DTMR is IS.!.!percent.

Since thecachemiss ratio is verydepende ntonthe progr am stll;,texecuteon thecache-bas edsystemsendthe models in l!l];HCideal(in general ,a realc;I(:11I' memory ismorecomplicated,andtherearcmorefactors Lohe considered},we ca ll seet111\t0' 11'design tar getmiss ratios areslightly higher thanseeninsimulations descr ibedabove,andcloseto thosefrommeasured resul ts,suchasforthe VAX- 11/1 80,whichlendssomecredibility totheusc oftheDTMIt as11reasoueble estimatorofcacheperformanc e,as notedill[iI).Thus,til(:sot

or

designtarget

missratiosisveryuse fulfordesignamiimpleurcutntiouofapossiblynew cnche01 architect ur e.Alsowecnnseethatthe linesizesofthesystemsdiscussedn!JoveS('C111 toosmall.Alargerlinesizeprovidesalowermissratiounderafixedcache size.Itis cleartha t cachesusin g sct-associ a tlvi tyhaveloweramiss ratio tll,111 thoseueingI,llt' direct-mapp edmethod. Anot herproblemistlml~IUJabovef;YSI.I~IIl~whichuse sd·

asso clntivityhaveasmallsot size, whichalfcd!!thecache missrut ios.IIIaddition, forimplementationsofexisliu gca chememories,a~lllosLallcaches areil1 lllll'l lll~Jll'~fl in eithermulti-chip oron-chip configurations.In thecase of chiplids,sevnra]chi p.~, inclu ding aile cach e controllerandseveralhigh-speedsta ticRAMchips,arcusedtn buihla cache memory.This kindof cachememoryisdesignedfor specialp]"(J!;essors and has afixed Clichesize.TIleYdonotIl1l\CmuchIlexihility;foreX;Lllll'l,~,till'

20

(36)

cachesize call netbe cha nged alterthe cachecomrcllerisdesigned ,and theyha ve longer delay time between Ihecachecont rollerandRAMchips. Anon-chip cache docs1I0t have adelaypenalty duetolnterco uno- t jon between thechips ofa multi- chipcacheml'mory,billen- chipcaches haveLIII~sa meproblem

ur

innexi bililyilS tlumulti-chip cachemcmories.Inadditioll,tili" kind ofcacheingenera llias only asmall capacity usinglrnlay'l'Itechnology, whichle.'l.dstol\11ig hcrmissratio.

lJl<Iing VLSItcchuojogy, IVecallma ke tratkUrr8to~'t-"lIigll"llo yd ca chememory

chipwithlittl clIeiay peunltybyeliminatingtIlewlrc-ccnncctlondelay betweenthe cache controller audthecnchcda tame mory.Multiple uniform cliche chips callbe use-d to buildcachesyste ms ofvariou s sizes, associated withone processo r.This cacheS}'Slclllcanheused asa trad itionalunifiCtI CACheforbet hinstructionsand Ui'lta, oriUIseparateilldrllcl io ns ordata cache .

21

(37)

3 IMPLEMENTATION OF THE CACHE ALGORITHMS

3.1 CacheDes ignParamet ers

Typically, a cachememory systemcancapturewellover 90pe rcentof all references to main memory.Optimiza ti o n ofthecache designpara me te rsis very importa nt todecrease thecost/performa nceratioforhig h-p erfor ma nce cachememories.

Optimizingthedesi gn ofcachememory has(om aspects(91:

Lmaximizingthehit ratio

2. minimizing theaccesstimeto cachedata

3,minimizingdelayduetoacachemiss

4.minimizingtheoverhea dor updati ngmai nmemoryandma int a iningcache coherence

Inaddition , for cachememoriesfo rmultiprocessor systems,ccneldcrarion11IL~

lobe takentoma ximi zebus andshared.memorybandwi dth amiLominimizelIw bus bandwidthrequiredb)'cadiprocessorin orde r 10maximizetl\(~system pcrfcr- munce , The re arcalso trallc·olfswhich<!CllCIHIonthe techno logyofilllpiemc n1at iull fo rtile cache;forexmuple , betweenhitrat io andaccess lillie.

Thereare manyfa ctorstobe considered duri ngf:lld w dosignwhichaffed system pe rforma nce.Parame te rs for cachedes ign areclassifiedintoint rinsic ilndextrinsic

22

(38)

paramete rs [5/. Elk-d ive mcmoryspeed and cost arctwointrinsic param eters.

Extrinsicpara meters, suchashit ratios, controlalgori thms,etc.,are selectedbased 011the res ults ofexperimentaldata andsimulation,and arc varlables whichmust be consideredfor thesystem design,

Of allthe ccnshlerntionswhicharerelated to cache memory, theIcllowiug nrc mainlyronslderr- d(luringde siglllliuCI!cache jJe rr nr l1la llCCis scnsitivetochuin :s concemlng these aspects:

J.Fdch policies

2.Mappingpolicies

3.lteple ccment policies

~.Swappingpolicies

5.lIit ratioand access time

6. Cac hemcmory capaci ty

7.Linesize

8.Cachedatapathwidth

!l.Ma in mcmoryorganization

Felchalgorithmsare usedto determinewhentilesyste m bringsinform,lt ioll intothecache memory.Ingeneral,themajor fetchalgorithmsarc deman d-retch end prcfctch. Unde rthe demandfelchalgorithm,alinc is fetchedonlyifitis

23

(39)

needed.Thepr('[eld lalgoril hm, 00other hand,gelflinformalionbeforeil isneeded.

Therefore,tbeprefcteh algo rithm isbased00 some kind ofpredictionabout. wl,irh linewill be usedIIcxl.IlIllIlS~be designedcardu llyiCthelIIachilll!pcrfOrn\;lIlCril' Lobeimprovedrather tha ndq;ri\dcd[91.hnlllcmcnl i\lionofi\prd l'k ltIll~uritlllll isusuallyUIO ! Ccollljllicate c.J than demandfclch.

Mappingpolicies tire usedtotranslate thelogicalnd d n.'S.~spacetorcllll'l(I,lr.,,~

space. EfficienladdressltillllllLt ionschemesshouldaccomplishaddresstti\II~ lalioll illsuchaway as to minimizethe apparenti\CCC~Stime.lnfonua tiongenerallyi~

obtainedfromthecachenssocintivcly; larger associativememoryismore expensive andslower. Helice, therelIIuslbesome trade-offofassodlllivityduringcache design,interms of thedesignendtechnclogles thatare employed. A mappingsuch

~halanyofthe linesin mainmemory callbemappedinto anyline slot.sincache memoryiscalledalull associativemapping.Thatis, alineofmain memorylimy bemapped intoanylocaLionoCthecachememory.TYI.i!:ally,k'ngLhofa.lineill cachememory is as the sa llieas that ofmain memory. Ifthe cachemelllory is fullandthereis a mip,thereques tedline('1mbeIrausferr odinto anylineIIlut ofcache memory frommain memory,ina.mannerdepending011tilerf'l'laCCltll'lll policyemployed .ThusUJi!tmappingprovidestheminunumprobabilityIorIj lU~ ~ltJL content ionproblemsand thelargesthil rat iofor/l.givenproblem.However,Illwillg one comparatorperadd resstllgmakes it verydifflcult ;UlI!costly toimpll:IJI(~ll1.t especiallyiniI.la rgec.'chcmemory.

A direct-mappedcache11<t.!tonlyoneccllllpara t.e,r",llid .isc' J1Illl'(:k.1lo Ill!

u...

(40)

address tagsin cache memory. Each timeonlyone address tag Cilll00 select ed lo comparewiththeaddressfromthe processor.This mappin g is a many-to-one mapping.'nat is,anygivenlineill mainmemory call reside logically only in one specjfledlino slot in cache memory. A dlrect-me ppedcache memorymandates a fixedreplacementpolicy; iftl-erc isalincmiss, beth the cache tag and the correspondinglineam replaced wil hthereq uested main memory address and its line. Thismapp ing has thehigh estprobabi lilyofcachememoryslotcontcutlon sincethere isiIfixed replacementscheme.Furthermore, it generallyha s a rclntlvely lowllitrati o. Unlikethefull-associativemapping, itis (Illite simpleami easy 10 implcrucut.

A thirdllHlppingmdh od is anS-w;,yset-assoclutive mapping, whichis1.1.hybrid ofthe direct-mappedandfull-nssociutivcmethods.AnScweyset-associative cache hasmultip le seta whiche1111be selectedbydirect -mapping,and therearc Slines slots in each setwhich can be simultaneouslycompared wit h theaddressfromthe processor.111thismappi ng system,there arcS comparators,a comparatorfor each

"WAY" _ Set-associativellIi1ppillg has areasonable intplemcutnt loncomplexityane]

hltrauo.lucrcasiugthe cache size of a set-associa tivecache gives11greaterhit ratio than increasingthedepth ofadirect-mappedsystem. On other hand , increasing the numberofseta,or ways,of a set-associativecache memory alsogives a greater hil rat io. lienee many high-performancecache memories,especially large scale caches, adopttheset-associativemappingmechanism as a compromise between complexityandperformance. Moredet ails ahoutScwayset-associativemapping

25

(41)

arc givenillthe next chapter.

Anoptima lreplacementpolicy wouldpredict the line whichwill be usedil1t·"dll-' memory(or a givenset)furthest inth.. Jutureand whichconsequently should h•.' discarded whenthecache memory(ora givenlidill cachememory) is full mill a cachemiss occurs.This policywould keep<.\a~ninthecadlC~Clptirlliwdforthr- lIigl1(:st hit ratio,and tilemaximumsystemthroughput.Howe ver,tllis opLilllal replacement policy cannotbe implementedsinceitrequires a predicLionof the future behaviorof therunningprogra m.'!. Thcrdo re,SOlllCapproximatiou hilSto be mad e.Therearcthree typesofpract ical replacement1l.lgorilll1l1sccnuuoulyused forcache memorysyslems;Iirat-jnfirst-out(F IFO), random,amiI(~nstrec ently used(LRU)linereplacement,to approximatethis function. The FIFOalgorithm is basedontheprinciple that thefirstlinetc be referred is predictedtohethe linenottobeused in cachememory(or in agivenset) Iurthostinthelntun-, and thatthis lineis replacedby the new one from mainmemory. Thisalgorithm doesnot reallyrcllectthe program localityVCf)'well,since the firstlinemi~)'hi' usedfrequen tly, butitis easy to implement. TherandomschemeishIL~I 'd011a random numberfrom a randomnumbergenerator to create theline11111l1hl~r

or

aline which isreplaced by anewonewhenever thereis a replacement need. A cach e memory employingthis algorithmtypicallyhas a101'1Idt1"I11iuslncothis algo rithmisnotable to reflectthe program locality.TheIt'a.~1rccclIlly1I.~cdtine repla cement algorithm,which looks backward(pa.~t),is usuallyuliletc rdb:tthe program localitywell since it is bnsed0/1historicallineusage.Thiltis ,lllllleast

(42)

1ISt.-dlineinthe recent pastisreplacedbytherequest edlinefro m mainmemory.

Sincethis algori thmrequires morestoredinforma tion abo ut the past , itismore dirficultto implementinhard warc, especiallyin&lar &c scale cachememory. A varia tioll,anapproximationoftheLRUalgorithm,canbe used to simplifythe hardwarehuplemcntaticu.This variationisbased onthefad1.hatif alinehasne t IK'CII referenced everarerteintimeperiod, itisless likelytobeneedednext tha n lincsillcache memor y (or in a givenset)tha thaveOl'C1lreferencedillthatperiod.

~l(lredetailsof lireIl!a~trecentlyusedlinercplecem entalgorit hm arcdescribed illthenextsection.Noonebest algorith mexist!from L11epract lcal I""placement algorithm.'!

l a).

SOllie.1lgorith lll,compa redwiththeUt/ICC IllgoriLllllls,isLcu erfur I'artinlluda!'i~t"Sof prohlems millpoorerforotherclasses.However,ingene ral , theLItUalgorithmis clearly the bestchoiceformost applieatlous, since it is based onhistoricallineusatc(tile recent past appears to beIIgood estimateofthe ncar flltucc),itworkswell,anditincreases thehitratio when thenumber

or

lim..'Sis increased.

SWI\I'pillgalgorit lullslire dcsjgnedfortra ns ferringII.newline (rom themain memorytothecachewhentherequestedinformationisnotinthe cache. 'Typ- icnlly, therearctwokinds ofsWOlppingelgcrithmc write.throll!J"andcOlly.back.

IIIthellwilc.tlmlllgllscheme,II.processor writetocache memoryisimmed ia tely writtenthrcugf tomainmemoryaswell.There fore,theinformati oninbothcache memoryandmaiumemory is I\lwl\Ysconsistent .Purthcnno reinamult iprocessor envi ronment,itcan Ilall(Ucmultiple-cachecoherenceinall easy way. Unlikethe

27

(43)

wrile-thl'ougllscheme, thecopy-backscheme(withoutlinemiss occurrences]only upd at es thecopiesof requesteddata in cache memorywithoutdisturbingmain memory.Wheneverthereis aline-miss, cachememorycopiesLackthelinetobe overwrittento mainmemorybeforetransferring the requested line tocachemoru- cry.Itcanreducetrefflcbetween cachememory andmainmemory.However,il requiresmore complicatedlogic;amithere is a coherenceproblemhdw('(~ncl,c111' memoryandmain memory,and potentiallybetweenmulti-cachesilla multipro- cessersystem. In contrast,thewrite-th rough methodhashigherIratfichdween cachememory and mainmemory since writeoperationsvaryfromJO.pcrce ntto 30 percentout oftotalreferences,depen ding onprocessor architecture andthe particularsetof applications.The average percentageof writeoperatio nsin19Jis 16.

The hitratio fora cache memoryis defined as the probability,or thefrac tion of times,thatamemoryrequest isfoundillcache memory.Ifwe define the pro!Jahilily of all thereferencesto memory as I, themiss ratioofcachememory is (J·liiL ratio).Thehit ratiofor acache memoryis one of..he mostilillJOl"LitnL Iacto rsfor the performa nceevaluationofcache memory.Otherimportantfactorsaffecting the cache performanceare theaccess time for thecachememory,includingthnc to search the directory,andthecachememorycyclelime,whichisdefinedas tile tim e theprocessor accessesinformation incachememory. The accessrimeofu cachememory is effectednotonlybythe architecture,ordesign[includingall thealgorithms and param etersselected ill cachedcsjgnandimplementation

or

the

28

(44)

illgorithll1~lnhard ware},but alsobythetechnolog y adopted(bipolar,crvIOS , de).

The cachecapacityis usually dictat edbymany{actors havingto(10 withtile system cost and performeoce. IIIgeneral,alarge cachecapacity call produce n highe r hit ratio,andillturna better perform a nce,However,there aresome lunitnt .ionsallcache size beyond whichcache memoryha s either a highCOl'lLor perform a nce decreasesdue totholong access time.

TIIClinesizeof cach ememoryis oneofthemost importantparameterswhich sensitivelyaffect cache perfonnanee.'I'horoare a. number

or

trnde-cffsforarea- souablelinesize ill tenusof architectureamitechnology. Using VLSItccltuology, itlargerline siZtlispreferred because it achieves alower miss ratiowithou tmuch extra("(1St.Hutif itis too large, it lncrcasesline lrilllSfcl"timeand,iutu rn,de- creasessystem speed eveniftilehitratio is increased.It alsodepends allthe datn pathwidthbet weencacheandmniumemory

TIle cacheda t allilthwidthmustbeconsidered duringtiledesign process siuccit diTl'CI,lydcrenuincs thetime requiredwhe nalineistransfe rr edfromITlltillmemory to cachellIemory,From the pointof vie wof pe rforma nce,the cachedata pnlh shouldbe as wideas posaible. Itis clear,however. that cachedatapa th isexpensive.

Doubling the pathwidthmeansdoubling thenumberor linesilland out oftile cache nud all theassoci,\ll~1circuitry. Tile pathwidth is criticallyimportant to caches illlplclI\('lIk d usingVLSItechnologybeca u se ofthelimited numberof I/Opins on a chip.llcuc-, a t.rarle-olfof the cache <la.tapil~hwi(l~hha s to hemade duringtile cuchedcslgo to achieve arC'l.sonahl e cost/performa nce.

29

(45)

Althoughth e use ofcachememoryincomputersystelllscangreatly rc(lnn' direct referencestomain memory, memorytraffic is~1iI1a very significant pnrfor- manoafador,especiallyillamultiprocessorSystClll.MemorytraHknlllsish of two components: felchtrafficandwrite-throughor copy-backtralfic.The felcll Irallicarisesfrom the tra nsferof datafromthelllai llmemo rylo thecadlCwhih' thewrite-throug horcopy-backtrafficis fromtile cachetc the main memory,The fel ch tra ffic callbe obtai nedbymultiplyingthemissratiobythelinesize to gd trllrric illbylCli/l"d ercllcc.Thewrite-throug htralllc cnu Ill' c/(k lllaL,·.1byurultlply- illga writeratio (thera t io ofwrites to total referen ces)hy the numbcrvlhytcaper write operation.Similarly,thecopy-backtrafficcallbedeterminedby llluiLiplyilll;

themissratioby the linesize, sincealineInisscauses writingofall cxistillscuche line inthecache into themainmemorybeforetran sferring therequestedmissing line to thecache.Forevaluationof acache-basedmultjproccssor system willi..

singlebus,abus utilizationcallbe usedtoestimatetile memorytramc.TIll!bus utilizationis defined as theratio oftimespentdoingusefulwurk totile totalruu timeof the bus.

Since decreasing memory trallic ortra nsfertimeduring aline miss ran increase the systemperformance ,optimizat ionoftheerguul zrdion ofboth themain memory andinterconn ec t ionnetwork is a key Factor forlrig hsyste m performancealit! low cost.Forthe interco nneclionnetwork, a wideda t apat h callreducethetransfer time,but. thecost ismu ch higher.011the otherha u-l, ifthemain memory isIWI,lI' 1111

or

se veralmodules whichcanoperate independently, traffic(', IIIb.~n'lllIn~1

30

(46)

bcceuscmoreUllm aile modulecan bebusywrilingalonetime.Fur lhennore,if mo d ules can treneferdilTeren l words inII.linebyinte rka ving,trll.Osferringa Hue frommain memorytc cache on1:'takesone main memorycycle.Thus .the main memory bandwidthca nbeincreasedwhile thetransfe r timeisgreallydecreased,

3.2 TheStr-uctureof the CacheMem ory

During thedesignoftheca che memory describedhere, algo r i thmsandpere ruc- lt~rsIISl,.oUhavebeen selecte dcnrerlllly,uudanumberof Irudc-cflsbetweenthem lmvc beenmade inorderLoachievehigh performance.'l'hccache memo ry system describedhereis implementedas asinglechip.Furthermore,this implementa tion allo ws a cacheof variablecapacity(largerthantileca p acity ofasingle cachechip) byu:-.ingseveral of the each ,.memorychips.The sillglt-chip cadlememorycleo scribedherehasIl.cnpacilyof8K hytes becauseofsili conarea limita tiollSforthe 3 micronCMOSll"Chnology.The wordsizeforthiscach emem ory is 32loits since LlIL"cacheisdesign edfora32-I,il computersystem.Awordis1I0tnecessarilytile Slll<ll1l0:0;tunltthattill'proo~ssorcallaccess.The pro cessorcalldirectl y accessI, 2.:J,or 4 bytes(millthe cache. Therefore,itprovidesmore Ilexihilitylo computer systemsillwhichthecacheillU~I-'t!.It...I~oallowsforthepossib ilitytha tthls cache

CIIUbe usedill16·hilcompu te rsystems ,providedccn t rcleig n nlsforthe cachecan connectwiththatofthepro ces sor withreaso nableadditioual logic. Two cloc k Ilhalll.'!l, CKI allliC!\·2.arc employedto pipelinethissyslem.Encl.of the cloc k I,hast':'!hasaminimumcyclc periodof36 nanoseconds(derivedfrom simulation )

31

(47)

'to Thecru

Figu re3:TileBASicCacheMemoryStruct.u re

in whidl theassocia tedprocessor ca nreadanilUlrucl iou or dala.from the ca che, Fig. 3dcpieuthestruct ur e oft.his cachememory.Itiscomposedof fourI.>a.~ic:

compone n t.s&:!Jfollows:theAddressTranslationFUlldiollorUircctory ,tileLiue ReplacementUnit.(LIlU),the Cach eMemo ryaudt.he ControlUnit.

DuringCK1,theaddres sfromthe processorislatchrJ inthe eddnss regislc'!"

ofUICcache, and t.hcl1itisecultothedirect oryto 111.1:ifthe: lill': COllln iuiug,l1l,' req uesteddataisintilecache.If so,thelinenumber gcnsrntcdbylhe/iutlIl/mllr.,.

gellenllor,the set numberfromtheaddress register, alllithe wordorrsdill1I1'~

lineare allcombine d10Iorml\word addressfor tile ad..!llK'I1lUryalillla td ll' ] intothecache memoryrcgist.cr.Inaddition;lilt!prOpl'fh)'k>{lI)C;lIIhe:aC(·~SI.,1

J2

(48)

hythe proces sor usingboth thetwole as t significaut bit.s of theaddress from the addr essregistel'amitwofunc t.ionbitsfromthe processor, whichwillhedescribed lntcr, Meanwhile,the LRUunit is up d ated toindicatet.hat the linerefe renced is tllcmostrecentlyusedoneillthe give nsetDuringCI<2,aread / write operatio n isdone.If the requested dataislIU~illthe cache, aline miss occurs.TheLRU unitisaskedto sendthe least recentlyused line numbe rilltheepcciliedset tothe directory,andthe directoryuses thisnumber tolocateits ccrrcs pomliugline slot illthespecifiedsel ofthe direct ory.Theilthecontentsof thisslotarereplaced with thegroupnumber and thelinenumberin theaddressregister. After replacement, theliftc numb er!Jcnera tol'give sthelinc numberccrrce pond iug tothis cell\.0the memo ryregis tertotransferthe rcqncated fine fromthemaln memoryto thecache.

'1'0ob tain theIinc of infermat.ionfrommainmemory,alinemiss signal is sentto uralumemory, After tilecachereceives a "bususe"grantfromthe system bus controller, theprocess or isIcrccd tobe idleduring transfer. There arc 8 words (32 hy tcs) to be u.msferred fro m themainmemory to the cachememoryduring aliucmillS,which woulduortnully take a longLimo11l1d inturudeercnsessystem perfonuaucc.IIIorder to reducethelinetransfertime,themain memory ilia)'be pnrt.if.ioucdint oscvera.lruodulcs,or"in terleaved "(inthiscase, the memoryshould hepilft itinlll'dinto 8rnodulee}.Whenev crthe reisII.line wiss, the8word s of the requestedlinecallthenbe senttothecacheme moryalmostsimultaneously, wi t h ('nehwordCOIl\ ;lIg(Willaseparatemodule ofthemainmemory.Thus,the transfe r limecnu grl'a llybode creased. Themain memor y org anizationwillhe discussed

33

(49)

later inehaptcr6.

3.3 TheAddr e ssSpa ceMapping

Sincetheeec bc,as thef;ul.clt.parioft.heuetuory hier a r chy,illIIl1ld,1I1lHllk'f1I,;ul themai nmemo ry,the re has t.oIJcIIma ppingf"nclio"ot.'!.wrellthecadlC;l(lt!rr!l!l spaceand1I,alofthemainmcmory.Asdscn sscdpre viously,thedir<.'Cl-llI:lllpl'l:l methodisthesimplesttoimplcmcor,but it[tnsIhchighcs\missratioorth re e mupplngmethods.'I'I screlorc,itlsnotlIsL'<1ill thisllppliG,tinll,AIS'I til,:fully - assccintivemethodrequires onecompa ratorp.orlineslotintill'lIin'ctory.Tllii'!

iscostly 10illlp lclIll~lItillAlaq;eKillecAellelIlemury.Also,itmllYint.rcdnccIll '

extratag-search delay andmaketilesearchlogicooluplicllt.cd,The set-associative method,,,hybridofthe direct-mappedmethodand the fully-associativeIlK:thm l, isused in this rnallping IileChani'm.It iuvd ves or&lInizin gthecachelIl1'lIloryinloS setsof Nlillesper set.WhenNbctom esolle,rheeacheis .r"lIy.a.~iative cAdll:

illwhich therearcSsetsiutctal,eachconsisting ofIIsing leline .IfSbecolIIl'sIll":, lIleorgAniza tionoIlh ecad leisthedirect-mappedcachememory,Since:llls.way set-as sociativecache allowsallYone ofSlines inareferenced seltobercplaCt..'d onaline miss,thisfle xibility usuallyintroduces11.lower missratiowithoutllll' complexityof11.Iully- eascciat.ivecache . Therefo re,it is/Icompeuulsohdwf'I.' 11 comp lexity andpcrfo r msucc.

(50)

TheSet-ass ociative~la pping Figure,1:

(51)

3.4 TheSet- a ssoci a t iveMapp ing

The principleofsft~assodalivcmappingis show nin Fig.4. Thecache memoryi~

dividedinto21setswilh 24line slotsilleach,andtheSi1.cSofthesetsamilines inthe cachememor yare thesmile as thoseillthemainmemo ry. Furthcnuor e the main memoryis pa rtit ionedintoseveralgroups,andthe sizeofeach groupisequ id 10 the sizeofthecache memory. Hence, eachgroupcoutalns 2' sets. Each~d slot inthecache mem orymust be sharedby severa lsetsofth'~mainIlH?1Il0I'Y.For examp le,illFig.4,thefirstsetillthe cachememoryisassigned10hold thesets 1,1 +2q,1

+

2 x 29 ""ofthemain memorynndLhcsccoudsel isassigucdIo

holdthesets2,2

+

29,2

+

2 x

z q " "

and50Ionb. Lines withina setofthemain

me mory areassociativelymappedinto anyof the 2' lineslulsillthecom-spoudlug setorthe cachememory.Thntis,anyselillthemainmemory callDilly be dircc ll}' mappe d toaspecificset of tile cachememoryand linesilla~tareassociatively map pedintoallYofthe 2'lineslotsin tileco rres pondingset.Sclsfro mdillcrent gro upscanbeiut e nuixcd withinthecachememory;tll/'rd or eHiltalltill'sets or agiven group needto be siumltancouelyreside ntillthecachememory:siuulurly linesillthesetsIromdifferent groups,whicharema pped intotheSHIIWsdtI(tl[(' cnehe memory, canalsobe iutenulxedwithinthatset ofthecachememory./;"!J' line1of sellof grouptcallhe assignedtillilll:slol 2or sdIillLhl~(~<LI'lll:menun'y andline2fI(set1 ofgroup2 canalsoresideillline slotIofse t Iattlu-Silllll'tim/'.

IntheFig. 'I, wenUlsee1Iiata memoryaddr ess is11ivilledlntoIonrIIlll'ls:1/1/

representstit.?grou pnumber,'listileset1l1l1ll11Cr, 5istllcline1l11111hl'Tatlll()istill' :IG

(52)

wordolrselwithinaIinc

3.5 Implementation of the Dir-ectory

lngeneral, incrcnsingthedcgrccctassociativibyoracac he memo ryCIIIIdecrease thr.

missralioofa cache. In orderto obtni nhigh performance,an 8-w1lYset-associa t ive nll\.p pingj~(~lllllloyedilllhj"directorytoachieve1\highhitra.Lcwithout.theextra dcluy pcualt.y while searchingtile directory. Thisdir ectory can mapthe main memory addressspace(32bi t) to th ocachememoryaddressspace(13bit)ill ,1 maximumof16nanoscccnds , includi ngthedelayoftheLllUullit,dete rminedby simulation. Fig, 5showstheorganizationofthe add ressmapping directorywith set associativemapp illg,Whentheprocessor requestsaread/writeoperation , the logicaladdress is map pedintotheca chememoryaddressby searching thedirectory.

Thisdirec toryhas atag arra yor32selswithalineslotsin each set.Each set is representedby a row oftIle tagertuyand the 8lineslotsina givensetarcindicated by columnsofthetngarray. Therefo re, this directory is an8~wayse t -associa tive directoryillwhicheachcolumn representsone way.

Thereis aMATClJsig nalfOI'each colu mnofslo ts.Ift.hcsignalis set ,the request edlineslotisin this columnof slots. Amongthe 8 M ATCIIsignals.

eachofwhic hconnec tstoa columnof the la g array.thereis only oneMATCII signa lvalid at any timesinceonlyoneslotmaybe selected. AlltheMATCII linesarecon nected toaline number genera tor.Thefi lle 'lumbergen eratorca n tra ns latetheilllcolumnHu m ber,atwhichthe correspondingMAT C Jlsignalis

37

(53)

a

,

a

, ,

o

ToLRO

I

~~~

Figure 5: The Directory

38

(54)

valid, into"bilia rynumberifor mi ug the request edlinenumberfor thecache memory.TJI/l~is,the slotatthe positionin theithrow andthejillcolum nCIUI producetheline number

i

ofthejilllineoftheiillset of thecachememory via the line Humbergenerator.IIIcnch line slot ,thegroup numberisconcntcnatc dwith tIl(:line number(1111+8)of" logica laddress, whichindicates that thespecifiedllue ofthemain memory rcsidcainthecachelineindica tedby thellneslo t.Whe [Jcvl~r arequest ed setis selectedthrough the 32-b ildirect orydecoderafterau address is latf!l('tl Into theaddres sr(·8i.~ter,the ccutentsof the8 lint:slotsint1w sch'ck<lsd arc sltuulteucously comp are dwithIId+sof tile add ressregiste r.Ifthecontents of any aile of the 8 lineslotsarc the satue asnd-t-s,the ntherequestedda ta areillthe Hueorth eselectedset, andthecorrespondi ngAIA'J'CIlsigna lshouldbecomevalid to11Ii1kethelinenumbergenerator produce therequestedlinenum be r(orthecadre memory,A111'1'flag is alsogeneratedby theline number genera to rtoindicate thatthe reques ted dataarcillthe cache.The linenumberis comlrincdwit lJthe setnumberandtheword offsetinthelinefrom theaddress registerloformtll("

requiredword addressof thecache memory.Meanw hile, the111'1'fla g informsthe LHUunittoupda te the recordsintheselectedset. Otherw ise,a.MISSnagis set toindicate thatthe requestedda t aarcno t in thecache, at whichtimethe rearc three tasksto bedone:

•luv c ketheLRU replace men t unittofindthe least recently usedlineillthe selec tedset ofthe cache. Th is linewill bereplacedby a new aile whenthe req uesteddataarc availablefrom themain memory.

39

(55)

•Inform theCPU~hn~i~IIIUStbe idle durin&thelinereplacement.

•Requesttile Inll;n mcomoryto t.ransferthe requiredlinetothe cache.

Fit;.6shows&simulationfor th et.agarrayoftiledirectoryfor thecasewhere alinc n.lss occurs durill&&read/ wri te 01>craliOl:I,and11..:addrcs..rcsilling illsluti of rowjisreplacedwithlhat intheaddressn:gu le,',Signa"CJulo11:1rCI'fl.'S<'ut thend+sfromthe addres sregister.SELjjgasignnlfrom thedecoderto select rowjof the directo ry.I\.WOIlDiisasignalfromtheLfiUunittoupdatetill' lineslot inrowjandcolumniofthetagarreyduring a lineIllis~jand"IJnj istheilh MAT C//Hue ofthedirectoryinFig.5,indicatin gwhether or1I0 t tile correspo nd ingCOIUIlI11is matched.Arterthelagarray isresetbytileJU~S,a uriss (alow/lIn)isproducedsincethejthrowofLagsselected 1JySBI'iisempty.The

~ignal ~VOnDifro mthe LRUunitupdates theslotinrowjandcolumniofthe

directory. Allera delayof 6nanose con ds.thelilT;sign&!becomeshightoindicate t.h at the updalinghAS beenfluisbcd.When lhesignAlsonBfI- /l21are rhalisco.l to a new addressandthenewaddressis 1I0tfoundillthedirector}',thenthe1111:

signalbecomeslowalter a311~del a y.

3.5.1 TheLineSlotorthe Di r ector y

As describe d1..,Iore,lllt~dlrectcryiscomposedorlineslolli. ElichI,ftileline sluls is used tostore tilegl"OI1 Pnumber andline1ll1111!JI'r(Ild+/I)ofagil'Cllmainmemory ad dress,which il1'.lirakllthatthecorrespondinglineIromtilem:riununneryisill th ecache 11K'lll'lry.For«achlill~slot,the reisiL22-hitbuilt-illcomparatortobe

.0

(56)

4121 30 40

3121 20

20

10

10

e

e

~~r

I

...

I I I

,

I~' ,

, ,

~

, •

~~

~

• ,

~

~ ,

i~

~

b==L='

L - j"'"

i I i i I i i i i

hit hitn

b2 1 b20 bl 9 bl8 bl7 bl6

si s

bl4 b13 b12 bll bl0 b9 b8 b7 b6

ss

b4 b3 bl

so

eel n

,. 1

wordn ucr-d

(57)

'OU 5",

I '" ,"U5" ~ Vdd

" n -d V"Vl1<l

M"t~h;

o\h lch, V..

Ilo",S.1 V..

Bit V.. lJit 1:lil(I) mi(ll)

(a)IITo, Bit ofTh . Di...,cto'1

Fig ure7:TheDirectoryTag

vaa

-

RowSe'

rE ~~" :.:~,,~.

ColA<IC.:"'4'ch,F.'io

figure8:The TagnilandValid Dilor theDirectory

42

Références

Documents relatifs

Ail codes are subsets of maximal codes and the investigation of maximal codes forms an active research area within the theory of codes (see, e. finite or regular) is it possible to

The Lassa fever outbreak is ongoing in Liberia with 25 new suspected cases, including five deaths reported since our last bulletin update (Weekly Bulletin 11, 16 March 2018)..

The recent upsurge in Lassa fever cases in Nigeria has resulted in national response efforts, but key gaps in activities such as surveillance, case management, and laboratory

Since the beginning of the outbreak on 13 February 2018, a cumulative total of 608 suspected cholera cases and three deaths (case fatality rate 0.5%) has been reported, as of 16

The number of cases of Lassa fever continues to increase in Nigeria, and support is needed from national and international partners to rapidly scale up laboratory capacity,

As of 29 April 2018, 29 sus- pected cases of Rift Valley fever have been reported from Yirol East of the Eastern Lakes State, including six confirmed human cases (one IgG and

On the other hand, each nuclear Fréchet space is isomorphic to a subspace of SN (a countable product of spaces isomorphic to the space of rapidly decreasing

Complétez les phrases suivantes avec le présent simple ou le présent