COMPUTER PROGRAMMING
FASCICLE 1
MMIX
DONALD E.KNUTH StanfordUniversity
ADDISON{WESLEY
6
77
Seealsohttp://www-s-faulty.stanford.edu/~knuth/mmix.htmlfordownloadable
software,andhttp://mmixmasters.soureforge.netforgeneralnewsaboutMMIX.
Copyright 1999byAddison{Wesley
Allrightsreserved.Nopartofthispubliationmaybereprodued,storedinaretrieval
system,ortransmitted, in anyform, orby anymeans, eletroni, mehanial,photo-
opying, reording, or otherwise, without the prior onsent of the publisher, exept
that the oÆial eletroni le may be used to print single opies for personal (not
ommerial)use.
Zerothprinting(revision15),15February2004
fasile / fas_
e
k e
l / n ::: 1: asmallbundle::: aninoresene onsistingof
a ompated ymeless apitatethan a glomerule
::: 2: oneof the divisions ofa bookpublishedin parts
|P. B.GOVE, Webster'sThirdNew InternationalDitionary (1961)
This is the first of a series of updates that I plan to make available at
regularintervals as IontinueworkingtowardtheultimateeditionsofTheArt
ofComputerProgramming.
IwasinspiredtopreparefasileslikethisbytheexampleofCharlesDikens,
whoissuedhisnovelsinserialform;hepublisheda dozeninstallmentsofOliver
Twist before havinganyideawhat would beomeof BillSikes! I was thinking
also of James Murray, who began to publish 350-page portions of the Oxford
English Ditionary in 1884, nishing the letter B in 1888 and the letter C in
1895. (Murraydiedin1915whileworkingontheletterT;mytaskis,fortunately,
muh simplerthanhis.)
UnlikeDikensandMurray,I haveomputerstohelpmeeditthematerial,
sothat Ianeasilymakehangesbeforeputtingeverything togetherinitsnal
form. Although I'mtrying mybestto write omprehensiveaountsthat need
nofurtherrevision, Iknowthateverypagebringsmehundredsofopportunities
tomakemistakesandtomissimportantideas. Mylesareburstingwithnotes
aboutbeautifulalgorithmsthathavebeendisovered,butomputersienehas
grown to thepointwhere I annot hopeto beanauthorityonall thematerial
I wish to over. ThereforeI need extensivefeedbak from readersbeforeI an
nalizetheoÆial volumes.
Inotherwords,IthinkthesefasileswillontainalotofGoodStu,andI'm
exited about the opportunity to presenteverything I write to whoeverwants
to read it, but I also expet that beta-testers like you an help me make it
Way Better. As usual, I will gratefully pay a reward of $2.56 to the rst
person who reports anything that is tehnially, historially, typographially,
orpolitiallyinorret.
Charles Dikensusually published his work onea month, sometimes one
aweek;JamesMurraytendedtonisha 350-pageinstallmentaboutoneevery
18months. Mygoal,Godwilling,istoproduetwo128-pagefasilesperyear.
Most of the fasiles will represent new material destined for Volumes 4 and
higher; but sometimes I will be presenting amendments to one or more of the
earliervolumes. Forexample,Volume 4will needtorefertotopis thatbelong
in Volume 3, but weren't inventedwhen Volume 3 rst ame out. Withluk,
theentire workwillmakesense eventually.
FasileNumberOneisaboutMMIX,thelong-promisedreplaementforMIX.
Thirty years have passedsine theMIX omputer was designed, and omputer
arhiteture has been onverging during those years towardsa rather dierent
styleofmahine. ThereforeIdeidedin1990toreplaeMIXwithanewomputer
thatwouldontainevenless saturatedfatthanitspredeessor.
Exerise 1.3.1{25 in the rst three editions of Volume 1 spoke of an ex-
tendedMIXalledMixMaster,whihwasupwardompatiblewiththeoldversion.
ButMixMaster itself haslong beenhopelessly obsolete. It allowed forseveral
gigabytes of memory, but one ouldn't even use it with ASCII ode to print
lowerase letters. And ouh, its standard subroutine alling onvention was
irrevoably based onself-modifying instrutions! Deimal arithmeti and self-
modifying ode were popularin 1962, but they sure havedisappeared quikly
asmahineshavegottenbiggerandfaster. FortunatelythenewRISCmahines
haveaveryappealingstruture,soI'vehada hanetodesignanewomputer
thatisnotonlyuptodatebut alsofun.
Many readers are no doubt thinking, \Why does Knuth replae MIX by
anothermahine insteadofjuststikingto ahigh-levelprogramminglanguage?
Hardlyanybodyuses assemblersthese days." Suhpeople areentitledto their
opinions, and theyneed notbother reading the mahine-language parts of my
books. But the reasons for mahine language that I gave in the prefae to
Volume 1,writtenin theearly1960s,remainvalidtoday:
Oneoftheprinipal goals ofmybooksis toshowhowhigh-levelonstru-
tionsare atually implemented in mahines, notsimply to show how they
are applied. I explain oroutine linkage, tree strutures, random number
generation, high-preision arithmeti, radix onversion, paking of data,
ombinatorialsearhing, reursion,et.,from thegroundup.
The programs needed in my books are generally so short that their main
pointsanbegraspedeasily.
Peoplewhoare morethan asuallyinterestedin omputers shouldhaveat
least some idea of what the underlying hardware is like. Otherwise the
programstheywrite willbeprettyweird.
Mahinelanguageisneessaryinanyase,asoutputofsomeofthesoftware
thatI desribe.
Expressingbasi methods like algorithmsfor sorting andsearhing in ma-
hinelanguagemakesitpossibletoarryoutmeaningfulstudiesoftheeets
ofaheand RAMsizeandotherhardwareharateristis (memoryspeed,
pipelining, multiple issue, lookaside buers, thesize of ahe bloks, et.)
whenomparingdierentshemes.
Moreover, if I did use a high-level language, what language should it be? In
the 1960s I would probably have hosen Algol W; in the 1970s, I would then
have had to rewrite my books using Pasal; in the 1980s, I would surely have
hangedeverything toC; in the1990s,I would havehadto swith to C ++
and
thenprobablyto Java. Inthe2000s, yet anotherlanguage will nodoubtbede
rigueur. I annot aord thetime to rewritemy books as languages goin and
outoffashion; languagesaren'tthepointofmybooks,thepointisratherwhat
youan doin yourfavoritelanguage. Mybooksfousontimeless truths.
ThereforeIwillontinuetouseEnglishasthehigh-levellanguageinTheArt
of Computer Programming, and I will ontinue to use a low-level language
to indiate how mahines atually ompute. Readers who only want to see
algorithmsthatarealreadypakagedinaplug-inway,usingatrendylanguage,
shouldbuyotherpeople'sbooks.
Thegoodnewsis thatprogramming forMMIX ispleasantandsimple. This
fasilepresents
1) a programmer's introdution to the mahine (replaing Setion 1.3.1 of
Volume1);
2) theMMIXassembly language(replaingSetion1.3.2);
3) newmaterialonsubroutines,oroutines,andinterpretiveroutines(replaing
Setions1.4.1,1.4.2,and1.4.3).
Ofourse, MIX appearsin manyplaesthroughout Volumes1{3,anddozens of
programsneed to berewritten for MMIX. Readers whowould like to help with
thisonversion proessareenouraged tojoin the MMIXmasters, ahappy group
ofvolunteersbasedatmmixmasters.soureforge.net.
I am extremely grateful to all the people whohelped me with the design
of MMIX. In partiular, John Hennessy and Rihard L. Sites deserve speial
thanksfortheirativepartiipationandsubstantialontributions. Thanksalso
toVladimirIvanoviforvolunteeringtobetheMMIX grandmaster/webmaster.
Stanford, California D.E.K.
May 1999
You an, if you want, rewrite forever.
| NEIL SIMON,Rewrites: A Memoir (1996)
Chapter 1|Basi Conepts . . . 1
1.3. MMIX . . . 2
1.3.1. DesriptionofMMIX . . . 2
1.3.2. TheMMIXAssemblyLanguage . . . 28
1.4. SomeFundamentalProgrammingTehniques . . . 52
1.4.1. Subroutines . . . 52
1.4.2. Coroutines . . . 66
1.4.3. InterpretiveRoutines . . . 73
Answers to Exerises . . . 94
IndexandGlossary . . . 127
1.3. MMIX
Inmanyplaesthroughoutthisbookwewillhaveoasiontorefertoaom-
puter'sinternalmahinelanguage. Themahineweuseisa mythialomputer
alled \MMIX." MMIX|pronounedEM-miks|is very muh likenearly every
general-purposeomputer designedsine1985,exeptthatitis, perhaps,nier.
Thelanguage of MMIX ispowerfulenoughto allowbriefprogramsto bewritten
formostalgorithms,yetsimple enoughsothatitsoperationsareeasilylearned.
The reader is urged to study this setion arefully, sine MMIX language
appears in so many parts of this book. There should be no hesitation about
learningamahinelanguage;indeed,theauthoronefounditnotunommonto
bewritingprogramsinahalfdozendierentmahinelanguagesduringthesame
week! Everyonewithmorethanaasualinterestinomputerswillprobablyget
toknowatleastonemahine languagesooneror later. Mahinelanguagehelps
programmersunderstand whatreallygoesoninsidetheiromputers. Andone
onemahinelanguage hasbeenlearned,theharateristis ofanotherare easy
to assimilate. Computer sieneis largelyonerned withan understanding of
howlow-leveldetailsmakeitpossibletoahievehigh-levelgoals.
Software for runningMMIX programs on almost any real omputer an be
downloaded from the website for this book (seepage ii). The omplete soure
odefortheauthor'sMMIXroutinesappearsinthebookMMIXware[LetureNotes
in Computer Siene 1750 (1999)℄; that book will be alled \the MMIXware
doument"inthefollowingpages.
1.3.1. Desriptionof MMIX
MMIXisa polyunsaturated,100%natural omputer. Likemostmahines, ithas
anidentifyingnumber|the2009. Thisnumberwas foundbytaking 14atual
omputers very similar to MMIX and on whih MMIX ould easily be simulated,
thenaveragingtheirnumberswith equalweight:
CrayI +IBM801 +RISCII+ClipperC300+AMD29K+Motorola88K
+IBM601+Inteli960+Alpha21164+POWER2+MIPSR4000
+HitahiSuperH4+StrongARM110+Spar64
=14
=28126=14=2009: (1)
The same number may also be obtained in a simpler way by taking Roman
numerals.
Bits and bytes. MMIX works with patterns of 0s and 1s, ommonly alled
binarydigits or bits,and itusually deals with 64 bitsat a time. For example,
the64-bitquantity
1001111000110111011110011011100101111111010010100111110000010110 (2)
isa typialpattern that themahine mightenounter. Longpatternslikethis
anbeexpressedmore onvenientlyifwegroupthebits fourat atimeand use
hexadeimaldigits torepresenteah group. Thesixteen hexadeimaldigitsare
0=0000;
1=0001;
2=0010;
3=0011;
4=0100;
5=0101;
6=0110;
7=0111;
8=1000;
9=1001;
a=1010;
b=1011;
=1100;
d=1101;
e=1110;
f=1111:
(3)
Weshallalwaysuseadistintivetypefaeforhexadeimaldigits,asshownhere,
sothat theywon'tbeonfusedwiththedeimaldigits0{9;andwewillusually
alsoputthesymbol
#
justbeforeahexadeimalnumber,tomakethedistintion
evenlearer. Forexample,(2) beomes
#
9e3779b97f4a716 (4)
in hexadeimalese. Upperase digits ABCDEF are oftenused instead of abdef ,
beause
#
9E3779B97F4A7C16 looks better than
#
9e3779b97f4a716 in some
ontexts;thereisnodiereneinmeaning.
A sequene of eight bits, or two hexadeimal digits, is ommonly alled
a byte. Most omputers now onsider bytes to be their basi, individually
addressable units of information; we will see that an MMIX program an refer
toasmanyas2 64
bytes,eahwithitsownaddressfrom
#
0000000000000000 to
#
ffffffffffffffff. Letters, digits, and puntuationmarks of languages like
English areoften representedwithone byte perharater, using theAmerian
Standard Code for Information Interhange (ASCII). For example, the ASCII
equivalent of MMIX is
#
4d4d4958. ASCII is atually a 7-bit ode with ontrol
haraters
#
00{
#
1f,printingharaters
#
20{
#
7e,anda\delete"harater
#
7f
[see CACM 8 (1965), 207{214; 11 (1968), 849{852; 12 (1969), 166{178℄. It
wasextendedduringthe1980stoaninternationalstandard8-bitodeknownas
Latin-1or ISO8859-1,therebyenodingaentedletters: p^ate is
#
70e274e9.
\Ofthe 256th squadron?"
\Ofthe ghting256th Squadron," Yossarian replied.
::: \That'stwo tothe ghtingeighthpower."
| JOSEPH HELLER, Cath-22 (1961)
A16-bitodethatsupportsnearlyevery modernlanguagebeameaninter-
nationalstandard during the1990s. This ode, known as Uniodeor ISO/IEC
10646UCS-2,inludes notonlyGreekletterslikeS ands (
#
03a3 and
#
033),
CyrilliletterslikeWandw (
#
0429 and
#
0449),Armenian letterslike and
(
#
0547 and
#
0577), Hebrew letters like Y (
#
05e9), Arabi letters like
(
#
0634), and Indian letters like f (
#
0936) or x (
#
09b6) or S (
#
0b36) or
(
#
0bb7), et., but alsotensof thousandsof East Asian ideographssuh as the
Chinese harater for mathematis and omputing, (
#
7b97). It even has
speial odes for Roman numerals: MMIX =
#
216f216f21602169. Ordinary
ASCII or Latin-1 haraters are represented by simply giving them a leading
byteofzero: p^ate is
#
007000e2007400e9,al'Uniode.
Wewillusetheonvenienttermwydeto desribea 16-bitquantitylikethe
wideharatersof Uniode,beausetwo-byte quantities arequiteimportantin
pratie. Wealsoneedonvenientnamesforfour-byteandeight-bytequantities,
whihweshallall tetrabytes (or\tetras")andotabytes (or\otas"). Thus
2bytes=1 wyde;
2wydes=1 tetra;
2tetras=1 ota:
Oneotabyteequalsfour wydesequalseightbytes equalssixty-fourbits.
Bytesandmultibytequantitiesan,ofourse,representnumbersaswellas
alphabetiharaters. Usingthebinarynumbersystem,
anunsignedbytean expressthenumbers 0::255;
anunsignedwydeanexpressthenumbers0 ::65,535;
anunsignedtetraanexpressthenumbers0 ::4,294,967,295;
anunsignedotaanexpressthenumbers0 :: 18,446,744,073,709,551,615.
Integersarealsoommonlyrepresentedbyusingtwo'somplementnotation,in
whihtheleftmostbitindiatesthesign: Iftheleadingbitis1,wesubtrat2 n
to
gettheintegerorrespondingtoann-bitnumberinthisnotation. Forexample,
1isthesignedbyte
#
ff;itisalsothesignedwyde
#
ffff,thesignedtetrabyte
#
ffffffff,andthesignedotabyte
#
ffffffffffffffff. Inthis way
asignedbytean expressthenumbers 128 ::127;
asignedwydeanexpressthenumbers 32;768::32,767;
asignedtetraanexpressthenumbers 2;147;483;648::2,147,483,647;
asignedotaanexpressthenumbers 9;223;372;036;854;775;808::
9,223,372,036,854,775,807.
Memoryand registers. Fromaprogrammer'sstandpoint,anMMIXomputer
has 2 64
ells of memory and 2 8
general-purpose registers, together with 2 5
speial registers (see Fig. 13). Data is transferred from the memory to the
registers,transformedin theregisters, andtransferred fromtheregistersto the
memory. Theells ofmemoryarealled M[0℄,M[1℄,:::,M[2 64
1℄;thusifxis
anyotabyte,M[x℄isabyteofmemory. Thegeneral-purposeregistersarealled
$0,$1,:::,$255;thus ifxisanybyte,$xisanotabyte.
The 2 64
bytes of memory are grouped into 2 63
wydes, M
2
[0℄ = M
2 [1℄ =
M[0℄M[1℄,M
2 [2℄=M
2
[3℄=M[2℄M[3℄,:::;eahwydeonsistsoftwoonseutive
bytesM[2k℄M[2k+1℄=M[2k℄2 8
+M[2k+1℄,andisdenotedeitherbyM
2 [2k℄
orbyM
2
[2k+1℄. Similarlythereare2 62
tetrabytes
M
4
[4k℄=M
4
[4k+1℄==M
4
[4k+3℄=M[4k℄M[4k+1℄:::M[4k+3℄;
and2 61
otabytes
M
8
[8k℄=M
8
[8k+1℄==M
8
[8k+7℄=M[8k℄M[8k+1℄:::M[8k+7℄:
Ingeneral if x is any otabyte, thenotations M
2 [x℄, M
4
[x℄, and M
8
[x℄ denote
thewyde, thetetra, and the ota that ontain byte M[x℄; we ignore the least
$0:
$1:
$2:
.
.
. .
.
. .
.
. .
.
. .
.
. .
.
. .
.
. .
.
.
$254:
$255:
rA:
rB:
.
.
. .
.
. .
.
. .
.
. .
.
. .
.
. .
.
. .
.
.
rZZ:
M[0℄ M[1℄ M[2℄ M[3℄ M[4℄ M[5℄ M[6℄ M[7℄ M[8℄
M[2 64
1℄
M[2 64
2℄
M[2 64
3℄
M[2 64
4℄
M[2 64
5℄
M[2 64
6℄
M[2 64
7℄
M[2 64
8℄
M[2 64
9℄
Fig. 13. The MMIX omputer, as seen by a programmer, has 256 general-purpose
registersand 32speial-purposeregisters, togetherwith2 64
bytes ofvirtualmemory.
Eahregisterholds64bitsofdata.
signiantlgtbitsofxwhenreferringtoM
t
[x℄. Forompleteness,wealsowrite
M
1
[x℄=M[x℄,andwedeneM[x℄=M[xmod2 64
℄whenx<0or x2 64
.
The 32 speial registers of MMIX are alled rA, rB, :::, rZ, rBB, rTT,
rWW, rXX, rYY, and rZZ. Like theirgeneral-purpose ousins, theyeah hold
an otabyte. Their uses will be explained later; for example, we will see that
rAontrolsarithmetiinterrupts whilerRholdstheremainderafter division.
Instrutions. MMIX's memory ontains instrutions as well as data. An in-
strutionor\ommand"isatetrabytewhosefourbytesareonventionallyalled
OP,X,Y,andZ. OPistheoperationode(or\opode,"forshort);X,Y,andZ
speifytheoperands. Forexample,
#
20010203isaninstrutionwithOP=
#
20,
X=
#
01, Y=
#
02, and Z=
#
03, andit means\Set $1 to thesum of$2 and
$3." Theoperandbytesarealwaysregardedasunsigned integers.
Eah of the 256 possible opodes has a symboli form that is easy to re-
member. Forexample,opode
#
20isADD . Wewilldealalmostexlusivelywith
symboli opodes; the numeri equivalentsan be found, ifneeded, in Table1
below,andalsoin theendpapers ofthisbook.
The X,Y, and Zbytes alsohavesymboli representations, onsistentwith
the assembly language that we will disuss in Setion 1.3.2. For example,
the instrution
#
20010203 is onventionally written `ADD $1,$2,$3', and the
additioninstrutioningeneraliswritten`ADD$X,$Y,$Z'. Mostinstrutionshave
threeoperands,butsomeofthemhaveonlytwo,andafewhaveonlyone. When
therearetwooperands,therstisXandtheseondisthetwo-bytequantityYZ;
thesymboli notationthen hasonly one omma. For example, theinstrution
`INCL $X,YZ ' inreases register $Xby theamount YZ. Whenthere is onlyone
operand,itistheunsignedthree-bytenumberXYZ,andthesymboli notation
hasno omma at all. For example, we will see that `JMP +4*XYZ' tells MMIX
to nd itsnext instrutionby skipping ahead XYZtetrabytes; the instrution
`JMP+1000000'hasthehexadeimalform
#
f003d090,beauseJMP=
#
f0and
250000=
#
03d090.
We will desribe eah MMIX instrution bothinformally and formally. For
example,theinformal meaningof `ADD$X,$Y,$Z' is\Set $Xto thesumof $Y
and$Z";theformaldenitionis`s($X) s($Y)+s($Z)'. Heres(x)denotesthe
signedinteger orrespondingto thebitpattern x,aordingto theonventions
oftwo'somplementnotation. Anassignmentlikes(x) N meansthatxisto
besettothebitpatternforwhihs(x)=N. (Suhanassignmentausesinteger
overow if N is too large or too small to t in x. For example, an ADD will
overowifs($Y)+s($Z) isless than 2 63
or greaterthan2 63
1. Whenwe're
disussing an instrution informally, we will often gloss over the possibility of
overow;theformaldenition,however,willmakeeverythingpreise. Ingeneral
theassignments(x) NsetsxtothebinaryrepresentationofNmod2 n
,where
nisthenumberofbitsinx,anditsignalsoverowifN < 2 n 1
orN 2 n 1
;
seeexerise5.)
Loading and storing. Although MMIX has256 dierentopodes, we willsee
thattheyfallintoafeweasilylearnedategories. Let'sstartwiththeinstrutions
thattransferinformationbetweentheregistersandthememory.
Eah of the following instrutions has a memory address A obtained by
adding$Yto$Z. Formally,
A= u($Y)+u($Z)
mod2 64
(5)
isthesumoftheunsignedintegersrepresentedby$Yand$Z,reduedtoa64-bit
numberbyignoringanyarrythatoursattheleftwhenthosetwointegersare
added. Inthisformulathenotationu(x)isanalogoustos(x),but itonsidersx
tobeanunsignedbinarynumber.
LDB$X,$Y,$Z(loadbyte): s($X) s M
1 [A℄
.
LDW$X,$Y,$Z(loadwyde): s($X) s M
2 [A℄
.
LDT$X,$Y,$Z(loadtetra): s($X) s M
4 [A℄
.
LDO$X,$Y,$Z(loadota): s($X) s M
8 [A℄
.
Theseinstrutionsbringdata frommemoryintoregister$X,hangingthedata
ifneessary froma signed byte, wyde,or tetrabyteto a signed otabyte of the
samevalue. Forexample,supposetheotabyteM
8
[1002℄=M
8
[1000℄is
M[1000℄M[1001℄:::M[1007℄=
#
0123456789abdef: (6)
Thenif$2=1000and$3=2,wehaveA=1002,and
LDB $1,$2,$3 sets$1
#
0000000000000045;
LDW $1,$2,$3 sets$1
#
0000000000004567;
LDT $1,$2,$3 sets$1
#
0000000001234567;
LDO $1,$2,$3 sets$1
#
0123456789abdef:
Butif$3=5,sothat A=1005,
LDB $1,$2,$3 sets $1
#
ffffffffffffffab;
LDW $1,$2,$3 sets $1
#
ffffffffffff89ab;
LDT $1,$2,$3 sets $1
#
ffffffff89abdef;
LDO $1,$2,$3 sets $1
#
0123456789abdef:
Whena signedbyte orwydeor tetrais onverted to asigned ota,itssign bit
is\extended"into allpositionsto theleft.
LDBU$X,$Y,$Z(loadbyteunsigned): u($X) u M
1 [A℄
.
LDWU$X,$Y,$Z(loadwydeunsigned): u($X) u M
2 [A℄
.
LDTU$X,$Y,$Z(loadtetraunsigned): u($X) u M
4 [A℄
.
LDOU$X,$Y,$Z(loadotaunsigned): u($X) u M
8 [A℄
.
These instrutionsare analogousto LDB ,LDW ,LDT,andLDO , buttheytreatthe
memory data as unsigned; bit positions at the left of the register are set to
zero when a short quantity is being lengthened. Thus, in the example above,
LDBU$1,$2,$3with$2+$3=1005would set$1
#
00000000000000ab.
The instrutions LDO and LDOU atually have exatly the same behavior,
beausenosignextension orpadding withzeros isneessary whenanotabyte
is loaded into a register. Buta good programmer will use LDO when the sign
is relevant and LDOU when it is not; then readers of the program an better
understandthesignianeofwhatisbeingloaded.
LDHT$X,$Y,$Z(loadhightetra): u($X) u M
4 [A℄
2 32
.
Here the tetrabyte M
4
[A℄is loaded into the left half of $X, and therighthalf
is set to zero. For example, LDHT $1,$2,$3 sets $1
#
89abdef00000000,
assuming(6)with$2+$3=1005.
LDA$X,$Y,$Z(loadaddress): u($X) A.
This instrution, whih puts a memory address into a register, is essentially
thesame as the ADDU instrutiondesribed below. Sometimesthe words\load
address"desribeitspurposebetterthanthewords\addunsigned."
STB$X,$Y,$Z(store byte): s M
1 [A℄
s($X).
STW$X,$Y,$Z(store wyde): s M
2 [A℄
s($X).
STT$X,$Y,$Z(store tetra): s M
4 [A℄
s($X).
STO$X,$Y,$Z(store ota): s M
8 [A℄
s($X).
These instrutions go the other way, plaing register data into the memory.
Overowispossible ifthe(signed)numberintheregisterliesoutsidetherange
of the memory eld. For example, suppose register $1 ontains the number
65536=
#
ffffffffffff0000. Thenif$2=1000,$3=2,and(6)holds,
STB $1,$2,$3 sets M
8 [1000℄
#
0123006789abdef (withoverow);
STW $1,$2,$3 sets M
8 [1000℄
#
0123000089abdef (withoverow);
STT $1,$2,$3 sets M
8 [1000℄
#
ffff000089abdef;
STO $1,$2,$3 sets M
8 [1000℄
#
ffffffffffff0000:
STBU$X,$Y,$Z(storebyteunsigned):
u M
1 [A℄
u($X)mod2 8
.
STWU$X,$Y,$Z(storewydeunsigned):
u M
2 [A℄
u($X)mod2 16
.
STTU$X,$Y,$Z(storetetraunsigned):
u M
4 [A℄
u($X)mod2 32
.
STOU$X,$Y,$Z(storeotaunsigned): u M
8 [A℄
u($X).
These instrutions have exatly the same eet on memory as their signed
ounterpartsSTB ,STW ,STT ,andSTO ,but overowneverours.
STHT$X,$Y,$Z(storehightetra): u M
4 [A℄
u($X)=2 32
.
Thelefthalf ofregister$Xisstoredinmemorytetrabyte M
4 [A℄.
STCOX,$Y,$Z(storeonstantotabyte): u M
8 [A℄
X.
Aonstantbetween0and255 isstoredinmemoryotabyte M
8 [A℄.
Arithmetioperators. Mostof MMIX'soperationstakeplaestritlybetween
registers. We might as well begin our study of the register-to-register opera-
tionsbyonsideringaddition,subtration,multipliation,anddivision,beause
omputersaresupposedto beabletoompute.
ADD$X,$Y,$Z(add): s($X) s($Y)+s($Z).
SUB$X,$Y,$Z(subtrat): s($X) s($Y) s($Z).
MUL$X,$Y,$Z(multiply): s($X) s($Y)s($Z).
DIV$X,$Y,$Z(divide): s($X)
s($Y)=s($Z)
[$Z6=0℄,and
s(rR ) s($Y)mods($Z).
Sums,dierenes,and produtsneednofurther disussion. TheDIVommand
formsthequotientandremainderasdenedinSetion1.2.4;theremaindergoes
into thespeial remainder register rR,where it an be examinedby using the
instrutionGET$X,rRdesribedbelow. Ifthedivisor$Ziszero,DIVsets$X 0
andrR $Y(seeEq.1.2.4{(1));an\integer dividehek"alsoours.
ADDU$X,$Y,$Z(addunsigned): u($X) u($Y)+u($Z)
mod2 64
.
SUBU$X,$Y,$Z(subtratunsigned): u($X) u($Y) u($Z)
mod2 64
.
MULU$X,$Y,$Z(multiplyunsigned): u(rH$X) u($Y)u($Z).
DIVU $X,$Y,$Z (divide unsigned): u($X)
u(rD$Y)=u($Z)
, u(rR)
u(rD$Y)modu($Z),ifu($Z)>u(rD); otherwise$X rD, rR $Y.
Arithmetionunsigned numbersneverausesoverow. A full16-byte produt
isformedbytheMULUommand,andtheupperhalfgoesintothespeialhimult
register rH. For example, when the unsigned number
#
9e3779b97f4a716 in
(2)and(4)aboveismultipliedbyitselfweget
rH
#
618864680b583ea; $X
#
1bb32095dd51e4: (7)
Inthis asethevalueofrHhasturnedoutto beexatly 2 64
minus theoriginal
number
#
9e3779b97f4a716;this isnota oinidene! Thereasonis that(2)
atuallygives the rst64 bits of thebinary representation of the golden ratio
1
= 1, if we plae a binary radix point at the left. (See Table 2 in
AppendixA.) Squaringgivesusanapproximationtothebinaryrepresentation
of 2
=1
1
,withtheradixpointnowattheleftofrH.
Division with DIVU yields the 8-byte quotient and remainder of a 16-byte
dividend with respet to an 8-byte divisor. The upper half of the dividend
appears in the speial dividend register rD, whih is zero at the beginning of
a program; this register an be set to any desired value with the ommand
PUT rD,$Z desribed below. If rD is greater than or equal to the divisor,
DIVU $X,$Y,$Z simply sets $X rD and rR $Y. (This ase always arises
when$Ziszero.) ButDIVU neverausesaninteger dividehek.
The ADDU instrution omputes a memory address A, aording to deni-
tion(5); therefore,as disussedearlier,wesometimesgiveADDU thealternative
nameLDA . Thefollowingrelatedommandsalsohelpwithaddressalulation.
2ADDU$X,$Y,$Z(times2andaddunsigned):
u($X) u($Y)2+u($Z)
mod2 64
.
4ADDU$X,$Y,$Z(times4andaddunsigned):
u($X) u($Y)4+u($Z)
mod2 64
.
8ADDU$X,$Y,$Z(times8andaddunsigned):
u($X) u($Y)8+u($Z)
mod2 64
.
16ADDU$X,$Y,$Z(times16andaddunsigned):
u($X) u($Y)16+u($Z)
mod2 64
.
It is faster to exeutetheommand 2ADDU $X,$Y,$Y than to multiplyby 3,if
overowisnotanissue.
NEG$X,Y,$Z(negate): s($X) Y s($Z).
NEGU$X,Y,$Z(negateunsigned): u($X) Y u($Z)
mod2 64
.
In these ommands Y is simply an unsigned onstant, not a register number
(justasXwasanunsignedonstantintheSTCOinstrution). UsuallyYiszero,
inwhihaseweanwritesimplyNEG $X,$ZorNEGU $X,$Z.
SL$X,$Y,$Z(shift left): s($X) s($Y)2 u($Z)
.
SLU$X,$Y,$Z(shift leftunsigned): u($X) u($Y)2 u($Z)
mod2 64
.
SR$X,$Y,$Z(shift right): s($X)
s($Y)=2 u($Z)
.
SRU$X,$Y,$Z(shift rightunsigned): u($X)
u($Y)=2 u($Z)
.
SL and SLU both produe the same result in $X, but SL might overow while
SLU neverdoes. SRextends thesignwhenshiftingright,but SRUshiftszerosin
from theleft. ThereforeSRand SRU produethe sameresultin $Xif andonly
if$Yis nonnegativeor $Zis zero. TheSLand SRinstrutions aremuh faster
thanMUL andDIVbypowersof2. AnSLUinstrutionismuhfaster thanMULU
byapowerof2,althoughitdoesnotaetrHasMULUdoes. AnSRUinstrution
ismuhfasterthanDIVUbyapowerof2,althoughitisnotaetedbyrD. The
notationyz isoftenusedtodenotetheresultofshiftingabinaryvaluey to
theleftbyzbits;similarly,yzdenotesshiftingtotheright.
CMP$X,$Y,$Z(ompare):
s($X)
s($Y)>s($Z)
s($Y)<s($Z)
.
CMPU$X,$Y,$Z(ompareunsigned):
s($X)
u($Y)>u($Z)
u($Y)<u($Z)
.
These instrutions eah set $X to either 1, 0, or 1, depending on whether
register$Yislessthan, equalto,or greaterthanregister$Z.
Conditionalinstrutions. Severalinstrutions basetheirationsonwhether
aregisterispositive,ornegative,orzero,et.
CSN$X,$Y,$Z(onditionalsetifnegative): ifs($Y)<0,set$X $Z.
CSZ$X,$Y,$Z(onditionalsetifzero): if $Y=0,set$X $Z.
CSP$X,$Y,$Z(onditionalsetifpositive): ifs($Y)>0,set$X $Z.
CSOD$X,$Y,$Z(onditionalsetifodd): ifs($Y)mod2=1,set$X $Z.
CSNN$X,$Y,$Z(onditionalsetifnonnegative): ifs($Y)0,set$X $Z.
CSNZ$X,$Y,$Z(onditionalsetifnonzero): if $Y6=0,set $X $Z.
CSNP$X,$Y,$Z(onditionalsetifnonpositive): ifs($Y)0,set$X $Z.
CSEV$X,$Y,$Z(onditionalsetifeven): ifs($Y)mod2=0,set$X $Z.
Ifregister$Ysatisesthestatedondition,register$Zisopied toregister $X;
otherwise nothing happens. A register is negative if and only if its leading
(leftmost)bitis1. Aregisterisoddifandonlyifitstrailing(rightmost)bitis1.
ZSN$X,$Y,$Z(zeroorset ifnegative): $X $Z[s($Y)<0℄.
ZSZ$X,$Y,$Z(zeroorset ifzero): $X $Z[$Y=0℄.
ZSP$X,$Y,$Z(zeroorset ifpositive): $X $Z[s($Y)>0℄.
ZSOD$X,$Y,$Z(zeroorset ifodd): $X $Z[s($Y)mod2=1℄.
ZSNN$X,$Y,$Z(zeroorset ifnonnegative): $X $Z[s($Y)0℄.
ZSNZ$X,$Y,$Z(zeroorset ifnonzero): $X $Z[$Y6=0℄.
ZSNP$X,$Y,$Z(zeroorset ifnonpositive): $X $Z[s($Y)0℄.
ZSEV$X,$Y,$Z(zeroorset ifeven): $X $Z[s($Y)mod2=0℄.
Ifregister$Ysatisesthestatedondition,register$Zisopied toregister $X;
otherwiseregister$Xissetto zero.
Bitwise operations. Weoften nd it useful to think of an otabyte xas a
vetor v(x)of 64individual bits, and to performoperationssimultaneously on
eahomponentoftwo suhvetors.
AND$X,$Y,$Z(bitwise and): v($X) v($Y)^v($Z).
OR$X,$Y,$Z(bitwise or): v($X) v($Y)_v($Z).
XOR$X,$Y,$Z(bitwise exlusive-or): v($X) v($Y)v($Z).
ANDN$X,$Y,$Z(bitwise and-not): v($X) v($Y)^v($Z).
ORN$X,$Y,$Z(bitwise or-not): v($X) v($Y)_v($Z).
NAND$X,$Y,$Z(bitwise not-and): v($X) v($Y)^v($Z).
NOR$X,$Y,$Z(bitwise not-or): v($X) v($Y)_v($Z).
NXOR$X,$Y,$Z(bitwise not-exlusive-or): v($X) v($Y)v($Z).
Here v denotes the omplement of vetor v, obtained by hanging 0 to 1 and
1to0. Thebinaryoperations^,_,and,denedbytherules
0^0=0;
0^1=0;
1^0=0;
1^1=1;
0_0=0;
0_1=1;
1_0=1;
1_1=1;
00=0;
01=1;
10=1;
11=0;
(8)
are applied independently to eah bit. Anding is the same as multiplying or
takingtheminimum;oringisthesameastakingthemaximum. Exlusive-oring
isthesame asaddingmod 2.
MUX$X,$Y,$Z(bitwisemultiplex): v($X) v($Y)^v(rM) _ v($Z)^v(rM ) .
TheMUX operationombinestwo bitvetorsbylooking atthespeialmultiplex
maskregisterrM,hoosingbitsof$YwhererMis1andbitsof$ZwhererMis0.
SADD$X,$Y,$Z(sidewaysadd): s($X) s P
v($Y)^v($Z) .
TheSADD operationountsthenumberofbitpositionsinwhihregister$Yhas
a1 whileregister$Zhasa 0.
Bytewiseoperations. Similarly,weanregardanotabytexasavetorb(x)
ofeightindividual bytes, eah ofwhihis aninteger between0 and 255;or we
anthinkofitasavetor w(x)offourindividualwydes,ora vetort(x)oftwo
unsignedtetras. Thefollowingoperationsdealwithallomponentsatone.
BDIF$X,$Y,$Z(bytedierene): b($X) b($Y) .
b($Z).
WDIF$X,$Y,$Z(wydedierene): w($X) w($Y) .
w($Z).
TDIF$X,$Y,$Z(tetradierene): t($X) t($Y) .
t($Z).
ODIF$X,$Y,$Z(otadierene): u($X) u($Y) .
u($Z).
Here .
denotestheoperationofsaturating subtration,
y .
z = max(0;y z): (9)
These operations haveimportant appliations to text proessing, as wellas to
omputer graphis(when thebytesor wydesrepresentpixel values). Exerises
27{30disusssome oftheirbasi properties.
We an alsoregardan otabyte as an 88 Boolean matrix,that is, as an
88arrayof0sand1s. Letm(x)bethematrixwhoserowsfromtoptobottom
are thebytes of xfrom left to right; and let m T
(x) bethe transposed matrix,
whoseolumns arethebytesofx. Forexample,ifx=
#
9e3779b97f4a716is
theotabyte (2) ,wehave
m(x)= 0
B
B
B
B
B
B
B
B
B
1 0 0 1 1 1 1 0
0 0 1 1 0 1 1 1
0 1 1 1 1 0 0 1
1 0 1 1 1 0 0 1
0 1 1 1 1 1 1 1
0 1 0 0 1 0 1 0
0 1 1 1 1 1 0 0
0 0 0 1 0 1 1 0 1
C
C
C
C
C
C
C
C
C
A
; m
T
(x)= 0
B
B
B
B
B
B
B
B
B
1 0 0 1 0 0 0 0
0 0 1 0 1 1 1 0
0 1 1 1 1 0 1 0
1 1 1 1 1 0 1 1
1 0 1 1 1 1 1 0
1 1 0 0 1 0 1 1
1 1 0 0 1 1 0 1
0 1 1 1 1 0 0 0 1
C
C
C
C
C
C
C
C
C
A
: (10)
Thisinterpretation ofotabytessuggests two operationsthat arequitefamiliar
tomathematiians,but wewillpauseamomenttodenethemfrom srath.
IfAisanmnmatrixandB isannsmatrix,andifÆandarebinary
operations,thegeneralizedmatrixprodut A Æ
B isthemsmatrixCdened
by
C
ij
=(A
i1 B
1j ) Æ (A
i2 B
2j
)Æ Æ (A
in B
nj
) (11)
for 1 i m and 1 j s. [See K.E. Iverson, A Programming Language
(Wiley, 1962), 23{24; we assume that Æ is assoiative.℄ An ordinary matrix
produtisobtainedwhenÆis+andis,butweobtainimportantoperations
onBooleanmatriesifwelet Æbe_or:
(A _
B)
ij
=A
i1 B
1j _A
i2 B
2j
__A
in B
nj
; (12)
(A
B)
ij
=A
i1 B
1j A
i2 B
2j
A
in B
nj
: (13)
NotiethatiftherowsofAeahontainatmostone1,atmostonetermin(12)
or(13) isnonzero. Thesame is trueiftheolumns of B eah ontainat most
one1. ThereforeA _
B andA
B bothturnouttobethesameastheordinary
matrixprodutA +
B=ABin suh ases.
MOR$X,$Y,$Z(multipleor): m T
($X) m T
($Y) _
m
T
($Z);
equivalently,m($X) m($Z) _
m($Y). (Seeexerise 32.)
MXOR$X,$Y,$Z(multiple exlusive-or): m T
($X) m T
($Y)
m
T
($Z);
equivalently,m($X) m($Z)
m($Y).
Theseoperationsessentially seteahbyteof$Xbylookingattheorresponding
byte of$Z and usingits bits to selet bytes of $Y; theseleted bytes are then
oredorxoredtogether. If,forexample,wehave
$Z=
#
0102040810204080; (14)
thenbothMOR andMXOR willset register$Xto thebytereversal ofregister $Y:
Thekthbytefromtheleftof$Xwillbesettothekthbytefromtherightof$Y,
for1k8. Ontheother hand if $Z=
#
00000000000000ff, MOR andMXOR
willsetallbytesof$Xtozeroexeptfortherightmostbyte,whihwillbeome
eithertheORortheXORofalleightbytesof$Y. Exerises33{37illustratesome
ofthemanypratialappliations oftheseversatile ommands.
Floatingpointoperators. MMIXinludesafullimplementationofthefamous
IEEE/ANSIStandard754foroatingpointarithmeti. Completedetailsofthe
oatingpointoperationsappearin Setion4.2 andin theMMIXware doument;
aroughsummary willsuÆeforourpurposeshere.
Every otabyte x represents a oating binary number f(x) determined as
follows: Theleftmostbitofxisthesign(0=`+',1=` ');thenext11bitsare
theexponent E;theremaining52bitsarethefration F.Thevaluerepresented
isthen
0:0,ifE=F=0(zero);
2 1074
F,ifE=0and F6=0 (denormal);
2 E 1023
(1+F=2 52
),if0<E<2047(normal);
1,ifE=2047andF=0(innite);
NaN(F=2 52
),ifE=2047andF6=0(Not-a-Number).
The\short" oating point numberf(t) represented bya tetrabyte t is similar,
but its exponent part has only8 bits and itsfration hasonly 23; the normal
ase0<E<255ofa shortoatrepresents2 E 127
(1+F=2 23
).
FADD$X,$Y,$Z(oatingadd): f($X) f($Y)+f($Z).
FSUB$X,$Y,$Z(oatingsubtrat): f($X) f($Y) f($Z).
FMUL$X,$Y,$Z(oatingmultiply): f($X) f($Y)f($Z).
FDIV$X,$Y,$Z(oatingdivide): f($X) f($Y)=f($Z).
FREM$X,$Y,$Z(oatingremainder): f($X) f($Y)remf($Z).
FSQRT$X,$Zor FSQRT$X,Y,$Z(oatingsquareroot): f($X) f($Z) 1=2
.
FINT$X,$Zor FINT$X,Y,$Z(oatinginteger): f($X) intf($Z).
FCMP$X,$Y,$Z(oatingompare):s($X) [f($Y)>f($Z)℄ [f($Y)<f($Z)℄.
FEQL$X,$Y,$Z(oatingequalto): s($X) [f($Y)=f($Z)℄.
FUN$X,$Y,$Z(oatingunordered): s($X) [f($Y)kf($Z)℄.
FCMPE$X,$Y,$Z(oatingomparewithrespettoepsilon):
s($X)
f($Y)f($Z) f(rE)
f($Y)f($Z) f(rE)
,see4.2.2{(21).
FEQLE$X,$Y,$Z(oatingequivalentwithrespettoepsilon):
s($X)
f($Y)f($Z) f(rE)
,see4.2.2{(24).
FUNE$X,$Y,$Z(oatingunorderedwithrespettoepsilon):
s($X)
f($Y)kf($Z) f(rE)
.
FIX$X,$Z orFIX$X,Y,$Z(onvertoatingtoxed): s($X) intf($Z).
FIXU$X,$Zor FIXU$X,Y,$Z(onvert oatingtoxedunsigned):
u($X) intf($Z)
mod2 64
.
FLOT$X,$Zor FLOT$X,Y,$Z(onvert xedtooating): f($X) s($Z).
FLOTU$X,$Zor FLOTU$X,Y,$Z(onvertxedto oatingunsigned):
f($X) u($Z).
SFLOT$X,$Zor SFLOT$X,Y,$Z(onvertxedto shortoat):
f($X) f(T) s($Z).
SFLOTU$X,$ZorSFLOTU$X,Y,$Z(onvertxedtoshort oatunsigned):
f($X) f(T) u($Z).
LDSF$X,$Y,$ZorLDSF $X,A(loadshortoat): f($X) f(M
4 [A℄).
STSF$X,$Y,$ZorSTSF $X,A(storeshort oat): f(M
4
[A℄) f($X).
Assignment to a oating point quantity uses the urrent rounding mode to
determinetheappropriatevaluewhenanexatvalueannot beassigned. Four
roundingmodesaresupported: 1 (ROUND_OFF),2 (ROUND_UP),3 (ROUND_DOWN),
and4(ROUND_NEAR). TheYeldofFSQRT ,FINT ,FIX ,FIXU ,FLOT ,FLOTU,SFLOT ,
andSFLOTUanbeusedtospeifyaroundingmodeotherthantheurrentone,
ifdesired. Forexample,FIX $X,ROUND_UP,$Zsetss($X)
f($Z)
. Operations
SFLOTandSFLOTUrstroundasifstoringintoananonymoustetrabyteT,then
theyonvertthat numbertootabyteform.
The `int' operationrounds to an integer. The operationyremz isdened
tobey nz, wherenis thenearestintegerto y=z,or thenearesteven integer
inaseofa tie. Speialrulesapplywhentheoperandsareinniteor NaN,and
speialonventionsgovernthe signofa zeroresult. The values+0:0 and 0:0
havedierentoatingpointrepresentations,butFEQLallsthemequal. Allsuh
tehnialitiesareexplainedintheMMIXware doument,andSetion4.2explains
whythetehnialities areimportant.
Immediate onstants. Programs often need to deal with small onstant
numbers. For example, we might want to add or subtrat 1 from a register,
or we mightwant to shiftby 32, et. Insuh ases it's a nuisane to load the
small onstant frommemory into anotherregister. So MMIX providesa general
mehanism by whih suh onstants an be obtained \immediately" from an
instrution itself: Every instrution we have disussed so far has a variant in
whih $Z is replaed by the number Z, unless the instrution treats $Z as a
oatingpointnumber.
For example, `ADD $X,$Y,$Z' has a ounterpart `ADD $X,$Y,Z', meaning
s($X) s($Y)+Z; `SRU$X,$Y,$Z' hasa ounterpart `SRU$X,$Y,Z', meaning
u($X)
u($Y)=2 Z
; `FLOT $X,$Z ' has a ounterpart `FLOT $X,Z ', meaning
f($X) Z. But`FADD $X,$Y,$Z'hasnoimmediateounterpart.
The opode for `ADD $X,$Y,$Z' is
#
20 and the opode for `ADD $X,$Y,Z'
is
#
21; weusethesamesymbolADDin bothases forsimpliity. Ingeneralthe
opodefortheimmediatevariantofanoperationisonegreaterthantheopode
fortheregistervariant.
Several instrutions also feature wyde immediate onstants, whih range
from
#
0000 =0 to
#
ffff =65535. These onstants, whih appearin theYZ
bytes, an be shifted into the high, medium high, medium low, or low wyde
positionsofanotabyte.
SETH$X,YZ(sethighwyde): u($X) YZ2 48
.
SETMH$X,YZ(setmediumhigh wyde): u($X) YZ2 32
.
SETML$X,YZ(setmediumlowwyde): u($X) YZ2 16
.
SETL$X,YZ(setlowwyde): u($X) YZ.
INCH$X,YZ(inreasebyhighwyde): u($X) u($X)+YZ2 48
mod2 64
.
INCMH$X,YZ(inreasebymediumhigh wyde):
u($X) u($X)+YZ2 32
mod2 64
.
INCML$X,YZ(inreasebymediumlowwyde):
u($X) u($X)+YZ2 16
mod2 64
.
INCL$X,YZ(inreasebylowwyde): u($X) u($X)+YZ
mod2 64
.
ORH$X,YZ(bitwiseor withhighwyde): v($X) v($X)_v(YZ48).
ORMH$X,YZ(bitwiseorwith mediumhighwyde):
v($X) v($X)_v(YZ32).
ORML$X,YZ(bitwiseorwith mediumlowwyde):
v($X) v($X)_v(YZ16).
ORL$X,YZ(bitwiseor withlowwyde): v($X) v($X)_v(YZ).
ANDNH$X,YZ(bitwise and-nothighwyde): v($X) v($X)^v(YZ 48).
ANDNMH$X,YZ(bitwise and-notmediumhighwyde):
v($X) v($X)^v(YZ32).
ANDNML$X,YZ(bitwise and-notmediumlowwyde):
v($X) v($X)^v(YZ16).
ANDNL$X,YZ(bitwise and-notlowwyde): v($X) v($X)^v(YZ ).
Usingat mostfouroftheseinstrutions,weangetanydesiredotabyteintoa
registerwithoutloadinganythingfromthememory. Forexample,theommands
SETH$0,#0123; INCMH$0,#4567; INCML $0,#89ab; INCL $0,#def
put
#
0123456789abdef intoregister $0.
TheMMIX assembly language allows usto write SET as anabbreviation for
SETL ,andSET$X,$YasanabbreviationfortheommonoperationOR$X,$Y,0.
Jumps and branhes. Instrutions are normally exeuted in their natural
sequene. Inotherwords,theommandthatisperformedafterMMIXhasobeyed
thetetrabytein memoryloationisnormallythetetrabyte foundinmemory
loation+4. (Thesymboldenotestheplaewherewe're\at.") Butjump
andbranhinstrutions allowthissequenetobeinterrupted.
JMPRA(jump): RA.
Here RA denotes a three-byte relative address, whih ould be written more
expliitlyas+4XYZ ,namelyXYZtetrabytesfollowingtheurrentloation.
Forexample,`JMP+4*2 'isasymboliformforthetetrabyte
#
f0000002;ifthis
instrutionappearsin loation
#
1000, thenext instrutiontobeexeuted will
betheonein loation
#
1008. Wemightinfat write`JMP#1008 ';butthenthe
valueofXYZwould depend ontheloationjumpedfrom.
Relative osets an also be negative, in whih ase the opode inreases
by1 andXYZis theosetplus2 24
. Forexample, `JMP-4*2 ' isthetetrabyte
#
f1fffffe. Opode
#
f0tells theomputerto\jumpforward"andopode
#
f1
tells it to \jump bakward," but we write both as JMP . In fat, we usually
writesimply `JMPAddr 'when wewanttojumpto loationAddr ,and theMMIX
assemblyprogramguresouttheappropriateopodeandtheappropriatevalue
ofXYZ.Suhajumpwillbepossible unlesswetrytostraymorethanabout67
millionbytesfromourpresentloation.
GO$X,$Y,$Z(go): u($X) +4,then A.
TheGOinstrutionallowsustojumptoanabsoluteaddress,anywhereinmem-
ory;thisaddressAisalulatedbyformula(5) ,exatlyasintheload andstore
ommands. Beforegoingtothespeiedaddress,theloationoftheinstrution
that would ordinarilyhave ome nextis plaed into register $X. Therefore we
ould return to that loation later by saying, for example, `GO $X,$X,0', with
Z=0as animmediateonstant.
BN$X,RA(branhifnegative): ifs($X)<0,set RA.
BZ$X,RA(branhifzero): if $X=0,set RA.
BP$X,RA(branhifpositive): ifs($X)>0,set RA.
BOD$X,RA (branhifodd): ifs($X)mod2=1,set RA.
BNN$X,RA (branhifnonnegative): ifs($X)0,set RA.
BNZ$X,RA (branhifnonzero): if $X6=0,set RA.
BNP$X,RA (branhifnonpositive): ifs($X)0,set RA.
BEV$X,RA (branhifeven): ifs($X)mod2=0,set RA.
A branh instrution is a onditional jump that depends on the ontents of
register$X. TherangeofdestinationaddressesRA ismorelimited thanitwas
withJMP,beauseonlytwobytesareavailabletoexpresstherelativeoset;but
stillweanbranhtoanytetrabytebetween 2 18
and+2 18
4.
PBN$X,RA (probablebranhifnegative): ifs($X)<0,set RA.
PBZ$X,RA (probablebranhifzero): if $X=0,set RA.
PBP$X,RA (probablebranhifpositive): ifs($X)>0,set RA.
PBOD$X,RA(probable branhifodd): ifs($X)mod2=1,set RA.
PBNN$X,RA(probable branhifnonnegative): ifs($X)0,set RA.
PBNZ$X,RA(probablebranhifnonzero): if $X6=0,set RA.
PBNP$X,RA(probablebranhifnonpositive): ifs($X)0,set RA.
PBEV$X,RA(probablebranhifeven): ifs($X)mod2=0,set RA.
High-speedomputersusuallyworkfastestiftheyanantiipatewhenabranh
will be taken, beause foreknowledge helps them look ahead and get ready for
futureinstrutions. ThereforeMMIXenouragesprogrammerstogivehintsabout
whetherbranhing islikelyornot. Whenevera branh isexpeted tobetaken
morethanhalfofthetime,a wiseprogrammerwillsayPBinsteadofB.
*Subroutine alls. MMIX also hasseveral instrutions that failitateeÆient
ommuniationbetweensubprograms,viaaregisterstak. Thedetailsaresome-
whattehnialandwewilldeferthemuntilSetion1.4;aninformaldesription
willsuÆehere. Shortprogramsdonotneedtousethesefeatures.
PUSHJ$X,RA (pushregisters and jump): push(X)and set rJ +4, then
set RA.
PUSHGO$X,$Y,$Z(pushregistersandgo): push(X)andsetrJ +4,then
set A.
Thespeialreturn-jumpregisterrJissettotheaddressofthetetrabytefollowing
thePUSH ommand. Theation\push(X)"means,roughly speaking, thatloal
registers $0 through $X are saved and made temporarily inaessible. What
used to be $(X+1) is now $0, what used to be $(X+2) is now $1, et. But
allregisters$k for krGremainunhanged; rGis thespeialglobal threshold
register,whosevaluealwaysliesbetween32and255,inlusive.
Register$kisalledglobal ifkrG. Itisalledloalifk<rL;hererListhe
speialloal thresholdregister,whihtellshowmanyloalregistersareurrently
ative. Otherwise, namelyif rL k <rG, register $k is alled marginal,and
$k is equal to zerowheneverit is used as a soureoperandin a ommand. If
a marginal register $k is used as a destination operand in a ommand, rL is
automatially inreased to k+1 before the ommand is performed, thereby
making$k loal.
POPX,YZ(popregistersandreturn): pop(X),then rJ+4YZ.
Here \pop(X)" means, roughly speaking, that all but X of the urrent loal
registersbeomemarginal,andthentheloalregistershiddenbythemostreent
\push"thathasnotyetbeen\popped"arerestoredtotheirformervalues. Full
detailsappearin Setion1.4,togetherwithnumerousexamples.
SAVE$X,0(saveproessstate): u($X) ontext.
UNSAVE$Z(restoreproessstate): ontext u($Z).
The SAVE instrution stores all urrent registers in memory at the top of the
registerstak, andputs theaddressofthetopmost storedotabyteinto u($X).
Register$Xmustbeglobal; thatis,X mustberG. Alloftheurrentlyloal
and global registers are saved, together with speial registers like rA, rD, rE,
rG, rH, rJ, rM, rR, and several others that we have not yet disussed. The
UNSAVE instrution takes the address of suh a topmost otabyte and restores
theassoiatedontext, essentially undoing apreviousSAVE. The valueofrL is
set to zeroby SAVE ,but restored by UNSAVE. MMIX has speial registers alled
theregister stak oset (rO) andregister stak pointer (rS), whih ontrol the
PUSH , POP , SAVE , and UNSAVE operations. (Again, full details an befound in
Setion1.4.)
*System onsiderations. Several opodes, intended primarily for ultrafast
and/or parallel versions of the MMIX arhiteture, are of interest only to ad-
vanedusers,but weshouldatleastmentionthemhere. Someoftheassoiated
operations are similar to the \probable branh" ommands, in the sense that
theygivehintstothemahineabouthowtoplanaheadformaximumeÆieny.
Mostprogrammersdonotneedtousetheseinstrutions,exeptperhapsSYNCID .
LDUNC$X,$Y,$Z(loadotaunahed): s($X) s M
8 [A℄
.
STUNC$X,$Y,$Z(storeotaunahed): s M
8 [A℄
s($X).
These ommands perform the same operationsas LDO and STO, but they also
inform the mahine that theloaded or stored otabyte and its near neighbors
willprobablynotbereadorwritten inthenearfuture.
PRELDX,$Y,$Z(preloaddata).
SaysthatmanyofthebytesM[A℄throughM[A+X℄willprobablybeloaded or
storedinthenearfuture.
PRESTX,$Y,$Z(prestoredata).
Says that all of the bytes M[A℄ through M[A+X℄ will denitely be written
(stored)beforetheyarenextread(loaded).
PREGOX,$Y,$Z(prefethtogo).
Says that manyof thebytes M[A℄ throughM[A+X℄ will probably be used as
instrutionsin thenearfuture.
SYNCIDX,$Y,$Z(synhronizeinstrutions anddata).
SaysthatallofthebytesM[A℄throughM[A+X℄mustbefethedagainbefore
beinginterpreted as instrutions. MMIX is allowed to assumethat a program's
instrutionsdonothangeaftertheprogramhasbegun,unlesstheinstrutions
havebeenpreparedbySYNCID. (Seeexerise57.)
SYNCDX,$Y,$Z(synhronizedata).
Says that all of bytes M[A℄ throughM[A+X℄ must bebroughtup to date in
the physial memory, so that other omputers and input/output devies an
readthem.
SYNCXYZ (synhronize).
Restrits parallel ativities so that dierent proessors an ooperate reliably;
seeMMIXware fordetails. XYZmustbe0,1,2,or 3.
CSWAP$X,$Y,$Z(ompareand swapotabytes).
Ifu(M
8
[A℄)=u(rP),whererPisthespeialpreditionregister,setu(M
8 [A℄)
u($X)and u($X) 1. Otherwiseset u(rP) u(M
8
[A℄) and u($X) 0. This
isanatomi(indivisible) operation,usefulwhenindependentomputerssharea
ommonmemory.
LDVTS$X,$Y,$Z(loadvirtualtranslationstatus).
Thisinstrution,desribedinMMIXware,isfortheoperatingsystemonly.
*Interrupts. The normal ow of instrutions from one tetrabyte to the next
an be hanged not only by jumps and branhes but also by less preditable
events like overow or external signals. Real-world mahines must also ope
withsuhthingsasseurityviolationsandhardwarefailures. MMIXdistinguishes
two kindsof programinterruptions: \trips" and \traps." A tripsends ontrol
toa trip handler,whih ispart oftheuser'sprogram;atrap sendsontrol toa
trap handler,whih ispartoftheoperatingsystem.
Eight kinds of exeptional onditions an arise when MMIX is doing arith-
meti, namely integer divide hek (D), integer overow (V), oat-to-x over-
ow (W), invalid oating operation (I), oating overow (O), oating under-
ow (U), oating division by zero (Z), and oating inexat (X). The speial
arithmeti status register rA holds urrent information about all these exep-
tions. Theeightbitsofitsrightmost byte arealleditseventbits,andtheyare
namedD_BIT(
#
80),V_BIT(
#
40),:::,X_BIT (
#
01),inorderDVWIOUZX.
Theeight bits justto theleft of theeventbits in rA are alled theenable
bits; theyappearin the same orderDVWIOUZX. Whenan exeptional ondi-
tionours duringsome arithmeti operation, MMIX looksat theorresponding
enable bitbefore proeedingto thenext instrution. Iftheenable bit is 0,the
orrespondingeventbitissetto1;otherwisethemahineinvokesatriphandler
by \tripping" to loation
#
10 for exeption D,
#
20 for exeption V, :::,
#
80
forexeptionX. Thus theevent bitsofrA reordtheexeptionsthathavenot
ausedtrips. (Ifmorethanoneenabledexeptionours,theleftmostonetakes
preedene. For example,simultaneous Oand XishandledbyO.)
ThetwobitsofrAjusttotheleftoftheenablebitsholdtheurrentrounding
mode, mod 4. The other46 bits ofrA should bezero. A programan hange
thesettingofrAat anytime,usingthePUTommand disussedbelow.
TRIPX,Y,ZorTRIP X,YZorTRIP XYZ(trip).
Thisommandforesatripto thehandlerat loation
#
00.
Wheneveratripours,MMIXusesvespeialregisterstoreordtheurrent
state: thebootstrapregisterrB,thewhere-interruptedregisterrW,theexeution
register rX,the Yoperand register rY, andtheZ operandregister rZ.First rB
isset to$255,then$255issettorJ,andrWissetto +4. Thelefthalf ofrX
issetto
#
80000000, andtherighthalfisset totheinstrutionthattripped. If
theinterruptedinstrutionwas nota storeommand,rYis setto$YandrZ is
set to $Z(or to Z in ase ofan immediate onstant); otherwise rYis set to A
(thememoryaddress of thestoreommand) andrZ is set to$X (thequantity
tobestored). Finallyontrolpasses tothehandlerbysettingto thehandler
address(
#
00or
#
10 oror
#
80).
TRAPX,Y,ZorTRAP X,YZorTRAP XYZ(trap).
ThisommandisanalogoustoTRIP ,butitforesatraptotheoperatingsystem.
SpeialregistersrBB, rWW,rXX, rYY,and rZZtaketheplaeof rB,rW,rX,
rY,andrZ;thespeialtrap address register rTsuppliestheaddressofthetrap
handler,whih is plaed in . Setion 1.3.2desribesseveral TRAP ommands
that provide simple input/output operations. The normal way to onlude a
programis tosay `TRAP0 '; this instrutionisthe tetrabyte 00000000, so you
mightrunintoitbymistake.
The MMIXware doument gives further details about external interrupts,
whih are governed by the speial interrupt mask register rK and interrupt
requestregister rQ.Dynamitraps,whih arisewhenrK^rQ6=0,arehandled
ataddress rTTinsteadof rT.
RESUME0(resumeafter interrupt).
If s(rX) is negative, MMIX simply sets rW and takes its next instrution
fromthere. Otherwise,iftheleading byteofrX iszero,MMIX sets rW 4
and exeutes the instrution in the lower half of rX as if it had appeared in
that loation. (This feature an be used even if no interrupt has ourred.
Theinserted instrutionmust notitself beRESUME.) OtherwiseMMIX performs
speialationsdesribedin theMMIXwaredoumentandofinterestprimarilyto
theoperatingsystem;seeexerise 1.4.3{14.
The omplete instrutionset. Table1showsthesymbolinamesofall256
opodes,arrangedbytheirnumerivaluesinhexadeimalnotation. Forexample,
ADDappearsin theupperhalfoftherowlabeled
#
2xandintheolumnlabeled
#
0 at the top,so ADD is opode
#
20; ORL appearsin the lower half of therow
labeled
#
Exandintheolumnlabeled
#
Batthebottom,soORLisopode
#
EB.
Table 1 atually says `ADD[I℄ ', not `ADD ', beause the symbol ADD really
standsfortwoopodes. Opode
#
20arisesfromADD$X,$Y,$Zusingregister$Z,
while opode
#
21 arises from ADD $X,$Y,Z using the immediate onstant Z.
Whenadistintionisneessary,wesaythatopode
#
20isADDandopode
#
21
isADDI(\add immediate");similarly,
#
F0 isJMPand
#
F1 isJMPB(\jumpbak-
ward"). Thisgiveseveryopodeauniquename. However,theextraIandBare
generallydroppedforonveniene whenwewriteMMIXprograms.
Wehavedisussednearly allof MMIX'sopodes. Twoofthestragglersare
GET$X,Z (getfrom speialregister): u($X) u(g[Z℄),where0Z<32.
PUTX,$Z (putintospeialregister): u(g[X℄) u($Z),where0X<32.
Eahspeialregisterhasaodenumberbetween0and31. Wespeakofregisters
rA,rB,:::,asaidstohumanunderstanding;butregisterrAisreallyg[21℄from
themahine'spointofview,andregisterrBisreallyg[0℄,et. Theodenumbers
appearin Table2onpage21.
GETommandsareunrestrited,butertainthingsannotbePUT: Novalue
anbeputintorGthatisgreaterthan255,lessthan32,orlessthantheurrent
setting of rL. No valuean be put into rA that is greater than
#
3ffff. If a
program tries to inrease rL with the PUT ommand, rL will stay unhanged.
Moreover,aprogramannot PUTanythinginto rC,rN,rO,rS,rI, rT,rTT,rK,
rQ,rU,orrV;these\extraspeial"registershaveodenumbersintherange8{18.
Mostofthespeialregistershavealreadybeenmentionedinonnetionwith
spei instrutions,but MMIX also hasa \lokregister" or yle ounter, rC,
whih keeps advaning; an interval ounter, rI, whih keeps dereasing, and
whih requests an interruptwhen itreaheszero; a serial number register, rN,
whih gives eah MMIX mahine a unique number; a usage ounter, rU, whih
Table1
THE OPCODES OF MMIX
#
0
#
1
#
2
#
3
#
4
#
5
#
6
#
7
TRAP5 FCMP FUN FEQL FADD4 FIX4 FSUB4 FIXU4
#
0x
#
0x
FLOT [I℄4 FLOTU[I℄4 SFLOT [I℄4 SFLOTU[I℄4
FMUL4 FCMPE4 FUNE FEQLE4 FDIV40 FSQRT40 FREM4 FINT4
#
1x
#
1x
MUL [I℄10 MULU [I℄10 DIV[I℄60 DIVU [I℄60
ADD [I℄ ADDU [I℄ SUB[I℄ SUBU [I℄
#
2x
#
2x
2ADDU [I℄ 4ADDU[I℄ 8ADDU [I℄ 16ADDU[I℄
CMP [I℄ CMPU [I℄ NEG[I℄ NEGU [I℄
#
3x
#
3x
SL[I℄ SLU [I℄ SR[I℄ SRU [I℄
BN[B℄+ BZ [B℄+ BP[B℄+ BOD [B℄+
#
4x
#
4x
BNN [B℄+ BNZ [B℄+ BNP[B℄+ BEV [B℄+
PBN [B℄3 PBZ [B℄3 PBP[B℄3 PBOD [B℄3
#
5x
#
5x
PBNN [B℄3 PBNZ [B℄3 PBNP [B℄3 PBEV [B℄3
CSN [I℄ CSZ [I℄ CSP[I℄ CSOD [I℄
#
6x
#
6x
CSNN [I℄ CSNZ [I℄ CSNP [I℄ CSEV [I℄
ZSN [I℄ ZSZ [I℄ ZSP[I℄ ZSOD [I℄
#
7x
#
7x
ZSNN [I℄ ZSNZ [I℄ ZSNP [I℄ ZSEV [I℄
LDB [I℄+ LDBU [I℄+ LDW[I℄+ LDWU [I℄+
#
8x
#
8x
LDT [I℄+ LDTU [I℄+ LDO[I℄+ LDOU [I℄+
LDSF [I℄+ LDHT [I℄+ CSWAP [I℄2+2 LDUNC [I℄+
#
9x
#
9x
LDVTS [I℄ PRELD[I℄ PREGO [I℄ GO [I℄3
STB [I℄+ STBU [I℄+ STW[I℄+ STWU [I℄+
#
Ax
#
Ax
STT [I℄+ STTU [I℄+ STO[I℄+ STOU [I℄+
STSF [I℄+ STHT [I℄+ STCO [I℄+ STUNC [I℄+
#
Bx
#
Bx
SYNCD [I℄ PREST[I℄ SYNCID [I℄ PUSHGO[I℄3
OR[I℄ ORN [I℄ NOR[I℄ XOR [I℄
#
Cx
#
Cx
AND [I℄ ANDN [I℄ NAND [I℄ NXOR [I℄
BDIF [I℄ WDIF [I℄ TDIF [I℄ ODIF [I℄
#
Dx
#
Dx
MUX [I℄ SADD [I℄ MOR[I℄ MXOR [I℄
SETH SETMH SETML SETL INCH INCMH INCML INCL
#
Ex
#
Ex
ORH ORMH ORML ORL ANDNH ANDNMH ANDNML ANDNL
JMP [B℄ PUSHJ[B℄ GETA [B℄ PUT [I℄
#
Fx
#
Fx
POP3 RESUME5 [UN℄SAVE20+ SYNC SWYM GET TRIP5
#
8
#
9
#
A
#
B
#
C
#
D
#
E
#
F
=2ifthebranhistaken,=0ifthebranhisnottaken
inreasesby1wheneverspeiedopodesareexeuted;andavirtualtranslation
register,rV,whihdenesamappingfromthe\virtual"64-bitaddressesusedin
programstothe\atual"physialloationsof installedmemory. These speial
registers help make MMIX a omplete, viable mahine that ould atually be
builtand run suessfully; but they are not of importane to us in this book.
TheMMIXware doumentexplainsthem fully.
GETA$X,RA(getaddress): u($X) RA.
This instrutionloads a relative address into register $X, using the same on-
ventionsastherelativeaddressesinbranhommands. Forexample,GETA$0,
willset$0totheaddressoftheinstrutionitself.
Table2
SPECIAL REGISTERS OF MMIX
ode saved? put?
rA arithmetistatusregister . . . 21
p p
rB bootstrapregister(trip) . . . 0
p p
rC yleounter . . . 8
rD dividendregister . . . 1
p p
rE epsilon register . . . 2
p p
rF failureloationregister . . . 22
p
rG globalthresholdregister . . . 19
p p
rH himultregister . . . 3
p p
rI intervalounter . . . 12
rJ return-jumpregister . . . 4
p p
rK interruptmaskregister . . . 15
rL loalthresholdregister. . . 20
p p
rM multiplexmaskregister . . . 5
p p
rN serialnumber. . . 9
rO registerstakoset . . . 10
rP preditionregister. . . 23
p p
rQ interruptrequestregister. . . 16
rR remainderregister. . . 6
p p
rS registerstakpointer . . . 11
rT trapaddressregister. . . 13
rU usageounter. . . 17
rV virtualtranslationregister . . . 18
rW where-interruptedregister(trip) . . . 24
p p
rX exeutionregister(trip) . . . 25
p p
rY Yoperand(trip) . . . 26
p p
rZ Zoperand(trip) . . . 27
p p
rBB bootstrapregister(trap) . . . 7
p
rTT dynamitrapaddressregister. . . 14
rWW where-interruptedregister(trap) . . . 28
p
rXX exeutionregister(trap) . . . 29
p
rYY Yoperand(trap) . . . 30
p
rZZ Zoperand(trap) . . . 31
p
SWYMX,Y,Zor SWYMX,YZor SWYMXYZ (sympathizewithyourmahinery).
The last of MMIX's 256 opodes is, fortunately, the simplest of all. In fat, it
is often alled a no-op, beause it performs no operation. It does, however,
keepthemahine runningsmoothly,justas real-world swimming helpsto keep
programmershealthy. BytesX,Y, andZareignored.
Timing. In later parts of this book we will often want to ompare dierent
MMIX programsto see whih is faster. Suh omparisons aren't easy to make,
ingeneral,beausetheMMIX arhiteturean beimplementedinmanydierent
ways. Although MMIX is a mythial mahine, its mythial hardware exists in
heap,slowversions aswell asinostlyhigh-performanemodels. Therunning
timeofaprogramdependsnotonlyonthelokratebutalsoonthenumberof
funtionalunits thatanbeativesimultaneouslyandthedegreetowhihthey
arepipelined; itdependsonthetehniquesused toprefeth instrutions before
theyareexeuted; itdependsonthe sizeof therandom-aess memorythat is
used to give the illusion of 2 64
virtual bytes; and it depends on the sizes and
alloationstrategies ofahesandotherbuers,et.,et.
For pratial purposes,the runningtime ofanMMIX program an oftenbe
estimated satisfatorily by assigning a xed ost to eah operation, based on
the approximate running time that would be obtained on a high-performane
mahine withlots of main memory;so that's what wewill do. Eah operation
willbeassumedto takeanintegernumberof, where(pronouned\oops")*
is a unit that represents the lok yle time in a pipelined implementation.
Althoughthevalueofdereasesastehnologyimproves,wealwayskeepupwith
thelatestadvanesbeausewemeasure timeinunits of,notin nanoseonds.
Therunningtimeinourestimateswillalsobeassumedtodependonthenumber
ofmemoryreferenesor mems that aprogram uses;thisis thenumberofload
andstore instrutions. For example, wewill assume that eah LDO (load ota)
instrutionosts+,where istheaverageostofa memoryreferene. The
totalrunningtimeofaprogrammightbereportedas,say,35+1000,meaning
\35memsplus1000oops." Theratio=hasbeeninreasingsteadilyformany
years; nobody knows forsure whether this trend will ontinue, but experiene
hasshownthat and deservetobeonsideredindependently.
Table 1,whih isrepeatedalsoin theendpapers ofthis book,displays the
assumedrunningtimetogetherwitheahopode. Notiethatmostinstrutions
takejust1,whileloadsandstorestake+. Abranhorprobablebranhtakes
1if preditedorretly, 3 ifpredited inorretly. Floatingpoint operations
usuallytake4eah,althoughFDIVandFSQRTost40. Integer multipliation
takes10;integer divisionweighsinat 60.
Eventhough wewill often use the assumptions of Table 1 for seat-of-the-
pantsestimatesofrunningtime,wemustrememberthattheatualrunningtime
might be quite sensitive to the ordering of instrutions. For example, integer
divisionmightostonly oneyle ifwean nd60other thingsto dobetween
thetime we issue the ommandand the time we need theresult. Several LDB
(loadbyte)instrutions mightneedtoreferenememoryonlyone,iftheyrefer
to the same otabyte. Yet the result of a load ommand is usually notready
for use in the immediately following instrution. Experiene has shown that
some algorithmswork well with ahe memory, and others donot; therefore
is not really onstant. Even the loation of instrutions in memory an have
a signiant eet on performane, beause some instrutions an be fethed
togetherwithothers. ThereforetheMMIXwarepakageinludesnotonlyasimple
simulator, whih alulates running times by the rules of Table 1, but also a
omprehensivemeta-simulator,whihrunsMMIXprogramsunderawiderangeof
dierenttehnologialassumptions. Usersofthemeta-simulatoranspeifythe
* TheGreekletterupsilon()iswiderthananitalilettervee(v),buttheauthoradmits
thatthisdistintionisrathersubtle.Readerswhoprefertosayveeinsteadofoopsarefreeto
doastheywish. Thesymbolis,however,anupsilon.
harateristisofthememorybusandtheparametersofsuhthingsasahesfor
instrutions and data, virtual address translation, pipelining andsimultaneous
instrutionissue,branhpredition,et. Givenaongurationleandaprogram
le, the meta-simulator determines preisely how long the speied hardware
wouldneedtoruntheprogram. Onlythemeta-simulatoranbetrustedtogive
reliable information about a program's atual behavior in pratie; but suh
results an bediÆult to interpret, beause innitely manyongurations are
possible. That'swhyweoftenresorttothemuhsimplerestimatesofTable1.
No benhmark resultshouldever betakenat fae value.
|BRIANKERNIGHANandCHRISTOPHERVAN WYK (1998)
MMIX versus reality. A person who understands the rudiments of MMIX
programminghasaprettygoodideaofwhattoday'sgeneral-purposeomputers
an doeasily; MMIX is very muh likeallof them. ButMMIX hasbeenidealized
in several ways, partly beausethe author hastried to design a mahine that
is somewhat \aheadof its time" so that it won't beome obsolete too quikly.
Therefore a briefomparison between MMIX and the omputers atually being
builtattheturnofthemillenniumisappropriate. Themaindierenesbetween
MMIXandthosemahinesare:
Commerialmahines donotignore thelow-order bitsof memoryaddresses,
asMMIXdoeswhenaessingM
8
[A℄;theyusuallyinsistthatAbeamultiple
of8. (Wewillndmanyusesforthose preiouslow-orderbits.)
Commerial mahines are usually deient in theirsupport of integer arith-
meti. Forexample,theyalmostneverproduethetruequotientbx=yand
trueremainderxmodywhenxisnegativeoryisnegative;theyoftenthrow
awaytheupperhalf of aprodut. Theydon'ttreatleft andright shiftsas
strit equivalentsofmultipliationand divisionbypowers of2. Sometimes
theydonotimplementdivisioninhardwareatall;andwhentheydohandle
division, theyusuallyassumethat theupperhalfofthe128-bitdividendis
zero. Suhrestritionsmakehigh-preisionalulationsmore diÆult.
CommerialmahinesdonotperformFINT andFREMeÆiently.
Commerial mahines do not (yet?) have thepowerfulMOR and MXOR opera-
tions. Theyusuallyhavea halfdozenor soadhoinstrutionsthat handle
onlythemostommonspeialasesof MOR.
Commerialmahinesrarelyhavemorethan64general-purposeregisters. The
256 registers of MMIX signiantly derease programlength, beausemany
variables and onstants of a program an live entirely in those registers
instead of in memory. Furthermore, MMIX's register stak is more exible
thantheomparablemehanismsin existingomputers.
All of these pluses forMMIX have assoiated minuses, beauseomputer design
always involves tradeos. The primary design goal for MMIX was to keep the
mahine as simple and lean and onsistent and forward-looking as possible,
withoutsariingspeedandrealismtoogreatly.