HAL Id: inria-00408568
https://hal.inria.fr/inria-00408568
Submitted on 3 Aug 2009
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of
sci-entific research documents, whether they are
pub-lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diffusion de documents
scientifiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
Alglave Jade, Luc Maranget
To cite this version:
Alglave Jade, Luc Maranget. Fences in Weak Memory Models. [Research Report] RR-7010, INRIA.
2009. �inria-00408568�
a p p o r t
d e r e c h e r c h e
N
0
2
4
9
-6
3
9
9
IS
R
N
IN
R
IA
/R
R
--7
0
1
0
--F
R
+
E
N
G
Fences in Weak Memory Models
Jade Alglave — Luc Maranget
N° 7010
Juillet 2009
Centre de recherche INRIA Paris – Rocquencourt
Jade Alglave, Lu Maranget Thème:
Équipe-ProjetMos ova
Rapportdere her he n° 7010Juillet200939pages
Abstra t: Wepresenthereanaxiomati framework,implementedintheCoq proofassistant,fordeningweakmemorymodelsintermsofseveralparameters: lo al reorderingsofreadsand writes,andvisibilityofinter andintrapro essor ommuni ationsthroughmemory. Inthis ontext,weprovideformaldenition of weak memory modelsindu ed byar hite tures, illustratedby denitions of
SC
and SparT SO
. Moreover, we dene a omparison over ar hite tures, an ar hite tureA
1
being weaker than another oneA
2
whenA
1
allows more behavioursthanA
2
. Inaddition, we provide a hara terisation of behaviours allowedbyA
1
whi harealsovalidonA
2
. Bythatmeans,weprovideasimple hara terisation of SC and TSO behaviours on any weaker ar hite ture. We alsoprovideanabstra tnotionofwhatshould bethea tionand pla ementof fen estorestoreagivenmodelfromaweakerone.Résumé : Nous proposons unenvironnementgénérique, implémenté ausein del'assistantde preuveCoq,pourdénirdes modèlesdemémoire relâ hésen fon tionde plusieurs paramètres: réordonnan ementsde le tures et é ritures, etvisibilités des ommuni ationsvialamémoire. Dans e ontexte,nous four-nissonsunedénitionformelled'unmodèledemémoireinduitparune ar hite -ture,quenousillustronsparlesdénitionsde
SC
etSparT SO
. Parailleurs, nous dénissons une notion de omparaison de deux ar hite tures, une ar hi-te tureA
1
étant onsidérée plus faible qu'une ar hite tureA
2
siA
1
autorise plusde omportementsqueA
2
. De plus,nous fournissons une ara térisation des omportementsautorisésparA
1
qui sontégalementvalides auseindeA
2
, e qui nous permet de donner une ara térisation simple deSC
etT SO
sur desar hite turesplusfaibles. Nousfournissonségalementuneformalisationdu pouvoiretdupla ementdesbarrièresmémoirespourrestaurerunmodèledonné depuisunmodèleplusfaible.Contents
1 Introdu tion 4
1.1 Anaxiomati generi model . . . 4
1.2 Studyofbarrierspower . . . 5
1.3 Casestudy: aPowermodel . . . 5
2 Des riptionofthe model 5 2.1 Axiomatisation . . . 5
2.1.1 Basi obje ts . . . 5
2.1.2 Exe utionwitnesses . . . 6
2.1.3 Ar hite tures . . . 8
2.1.4 Validityofanexe utionwithrespe ttoanar hite ture . 9 2.1.5 Examples . . . 10
2.2 Comparisonofar hite tures . . . 12
2.2.1 Makingvaliditymonotonous . . . 13
2.2.2 Examples . . . 13
2.3 Equivalen ewithnativemodels . . . 14
2.3.1 S isSC . . . 14 2.3.2 TsoisTSO . . . 14 2.4 Testing. . . 15 2.4.1 Tools . . . 15 2.4.2 Comparisonofmodels . . . 15 2.4.3 Chara teristi tests. . . 16 3 Semanti sof barriers 19 3.1 Barriersguarantee . . . 19
3.2 Consideringaweakerguarantee . . . 20
4 Case study: a Power model 21 4.1 Completeeventstru turesandexe utionwitnesses . . . 22
4.2 Globalityofrfmaps. . . 24
4.3 Preservedprogramorder
ppo
. . . 244.4 Valuesdonot omeoutofthin air . . . 25
4.5 Cumulativememorybarriers . . . 27
5 Barrier experiments 28 5.1 O ialtests. . . 28
5.2 Classi altests . . . 31
5.3 Experiments. . . 31
6 Towards a stronger model 34 6.1 Extension ppo-ext
→
. . . 34 6.2 Semanti sof lwsyn . . . 34 7 Con lusion 37 7.1 Contribution . . . 37 7.2 Statusofwrites . . . 371 Introdu tion
Memory models are what des ribe and onstrain the behaviour of aprogram runningonamultipro essor. That said,understandingwhat aprogramwould doonsu hama hinerequiresapre isedenitionofthememorymodelindu ed by the ma hine, that is, the underlying memory system and the behaviour of the pro essors involved. Previousstudies [14, 18℄ have dis ussed the need for a rigorous denition of weak memory models, whi h some of the publi do umentations[3,4℄la k. Weprovidehereageneri andaxiomati framework to pre iselydene amemorymodelin terms ofseveral parametersand testit againstrealhardware.
Let us onsider a shared-memory multipro essor system, that onsists of several pro essors writing to or reading from a ommon shared memory. We will dis uss here what representation of memory and pro essorbehaviour we onsider.
Representationof memory Onerepresentationofasharedmemory ould beasinglememoryonwhi hseveralpro essorsoperatesimultaneously,alltheir writes being ommited to memory assoon asthey areissued. Thus, one an onsider the onne tion between pro essors and memory as dire t: as soon asone pro essorwrites to memory, the value written overwrites the previous valueandisimmediatelyavailabletoallpro essors. Thisproperty, alledstore atomi ity,hasbeenexaminedandadvo atedasvaluable[16,8,11℄asitprovides theguaranteethat a tionsonsu h amemoryareserialisable, whi h leadsto a ratherunderstandablememorymodel. However,itisnotguaranteedonseveral realar hite tures[1,10,3,4℄. Theyindeedrelaxthestoreatomi ity onstraint, whi h means a write is not available to all pro essors at on e. For example a write is at rst initiated by a given pro essor, then ommitted to a a he, and nally to memory. This last step is sometimes alled globally performed [15℄. Evenwithoutassumingwrites tobe ommittedimmediately, wesuppose atotalorderonthegloballyperformedwrites tothesamelo ation,aproperty sometimes alled oheren e [5℄thatiswidelyassumedbymodernar hite tures [1,10,3,4℄.
Pro essorbehaviour Onerepresentationofapro essorbehaviour ould sup-poseasequentialorder, onsistentwiththeprogramorder,ofallthereadsand writeseventsissuedbyagivenpro essor,asageneralisationoftheunipro essor ase. However,modernar hite tures[1,10,3,4℄providerelaxedmemorymodels that donot onstrainthewayreads andwrites are orderedthat mu h. These onstraints,ortheirrelaxation,areoftengatheredbehindtheterminstru tion reordering [8,11℄.
1.1 An axiomati generi model
Wewillpre iselydeneanar hite tureintermsofitsorderingandstore atomi -ityrelaxationsatse tion2.1.3. Forexample,SequentialConsisten y(hen eforth
SC
)[17℄,supposeswritestobe ommittedtomemoryassoonastheyareissued, andthat theprogramorderis maintainedbetweenalla esses,thusbeingthe strongest(inasensethatwillbedenedpre iselyatse tion2.2)memorymodel. Wewillillustratehowtoinstan iateourmodeltoprodu eSC
andSparT SO
P
0
P
1
(a)
x←
1(c)
y←
1(b)
r2←
y(d)
r4←
x i3r2 = 0 ∧ r4 = 0
?Figure1:
i
3
exhibitsnon-SC behaviouronmodernar hite tures[1℄,andshowequivalen ewiththenativemodels,togetherwith hara terisation ofexe utionsthatwouldbevalidonthesemodels.
1.2 Study of barriers power
SC
providesindeed arather omfortable programmingmodel, whi h explains whymostar hite turesprovideme hanismssu hasbarriersandlo ks,torestore it from a weaker model. However, it is not lear how mu h power a barrier needsto providetheillusionofSC
, andwhere to pla ethese onstru tionsin the ode. Weexaminethisquestionatse tion3.1fromageneralpointofview: weprovideasu ient onditiononbarrierstorestoreastrongestmodelfroma weakerone. Moreover,werenethis onditioninsomeparti ularyetinteresting asesatse tion3.2,su hasT SO
[1℄.1.3 Case study: a Power model
Our generi framework,implemented in the Coq proof assistant[12℄, has two ompanion tools: memevents,written in OCaml, whi h is an exa t implemen-tation of our axiomati model, and litmus, whi h runs the same inputs that memevents takesonrealhardware.
Weprovidea serie of tests to instan iateproperlyourmodel with respe t to agiven ma hine or ar hite ture, whi h allowed us to design a model for a signi antfragmentofthePowerar hite turewithbarriers.
2 Des ription of the model 2.1 Axiomatisation
The lassi altestdepi tedatg. 1,whi h anbefoundin[6℄withnumber
2.3a
illustrates the fa t that we annot use aninterleavingsemanti sto reason on exe utionsindu edbyweakmemorymodels,asitsexhibitsanon-SCbehaviour onsome urrentar hite tures [6,5℄. Instead,wereasononrelationsoverread andwrite eventsraisedbyaninstru tion.2.1.1 Basi obje ts
Aneventisanabstra tionofamemorya essperformedduringtheexe utionof amultipro essorprogram. Wenote
E
thesetofeventsgeneratedbyaparti ular exe ution. Eventsareoftwokinds: readsandwrites,whi hsetswillbedepi ted byR
andW
. Hen eforth,wewillnotee
foranevent,r
foraread,andw
fora write. An evente
will holditsdire tion(R
orW
),itslo ation,givenbyloc e
,a: W[x℄=1 b: R[y℄=0 po:0 0:r1=0 : W[y℄=1 d: R[x℄=0 po:1 1:r2=0 a: W[x℄=1 b: R[y℄=0 po:0 0:r1=0 : W[y℄=1 d: R[x℄=1 po:1 1:r2=1 a: W[x℄=1 b: R[y℄=1 po:0 0:r1=1 : W[y℄=1 d: R[x℄=0 po:1 1:r2=0 a: W[x℄=1 b: R[y℄=1 po:0 0:r1=1 : W[y℄=1 d: R[x℄=1 po:1 1:r2=1
Figure2: Eventstru turesfortesti3.
itsvalue, givenby
val e
and itspro essor, given byproc e
. We will note(a)
x←
vforawrite tolo ationx
withvaluev
labelled(a)
,and(b)
r1←
y,fora readfromy
labelled(b)
.An exe ution is also hara terised by the program order po
→
, a relationon eventsthatree tsthesequentialexe utionofinstru tionsonasinglepro essor: giventwoinstru tionsi
1
andi
2
thatgenerateeventse
1
ande
2
,havinge
1
po
→ e
2
oneventssimplymeansthat
i
1
pre edesi
2
inprogramorder. po→
isatotalorder amongst the events from the same pro essor1
and never relates events from dierentpro essors.
We olle tthese informationsinto aneventstru ture,depi tedby
E
:E ,
(E,
po→)
Figure2illustratestheeventstru turesasso iatedtothetesti3depi tedat g. 1.
2.1.2 Exe ution witnesses
Wepostulatetworelationsoverevents: rf
→
andws
→
.Rf A read-from map, links aread event with the write event that provides itsvalue. We representthe notionbyarelationfrom writes toreads,whi his well-formedin thefollowingsense:
prf
→ , {(w, r) | ∃lv, w ∈ W
l,v
∧ r ∈ R
l,v
}
wf
rf
rf→ ,
rf→⊆
prf→ ∧∀r, ∃! w, w
rf→ r
Wegatheredrstallpairsofwritesandreadswithsamelo ation
l
andvaluev
, whi h setsaredepi tedbyW
l,v
andR
l,v
, andthenenfor edtheuniquenessof readsour es.1
Whensomeinstru tionsmayperformseveralmemorya esses, po
→
shouldin lude some ofintra-instru tiondependen ies [18 ℄,thusbe omingapartialorderon eventsfroma same pro essor.Ws The writeserialisation isa totalorder of thewrites to a samelo ation. Thus, we rst gather allpairs of writes to the same lo ation, and we require therelationtobeatotalorderonwritestoasamelo ation
l
,whi hset willbe depi tedbyW
l
: pws→
, {(w
1
, w
2
) | ∃l, w
1
∈ W
l
∧ w
2
∈ W
l
}
wf
ws
ws→ ,
ws→⊆
pws→ ∧ ∀ℓ, total order (
ws→ ↾ W
ℓ
) W
ℓ
Thenotation ws→ ↾ W
ℓ
standsfortherestri tionoftherelation ws→
tothesetW
ℓ
, i.e.ws
→ ∩ (W
ℓ
× W
ℓ
)
.Fr Fromthesetworelations,wededu eathird one,fr:
r
fr→ w , ∃ w
′
, w
′
rf→ r ∧ w
′
ws→ w
w
r
w
0
(rf) (fr) (ws) Aswesaid, ws→
ordersgloballyperformedwritesto thesamelo ation; thus, ifawritew
′
isbeforeanotherwrite
w
in ws→
,weknowthatw
′
isgloballythat is, for everypro essorbefore
w
. Furthermore,if areadr
reads fromw
′
, we onsider
r
tobegloballyorderedthefollowingwritew
: otherwise,therewould be no guaranteer
a tually read its value fromw
′
, thus ontradi ting the rf
→
relationbetweenthem.Exe ution witnesses We gather these relations ex ept fr
→
as it anbe dedu edfrom theothersinto anexe utionwitness,depi tedbyX
:X ,
(
rf→,
ws→)
Figure 3 adds rf→
and fr→
edges to the event stru tures of gure 2. There are nows
→
edgesamong the(non-initialisation) writes shown. However,we see somefr
→
arrowswhi h followfrom the serialization of init stores(whi h ome rstinws
→
)andofstoresgeneratedbyinstru tions. Forinstan e,intheleftmost pi ture,wehaved
fr
→ a
. Indeed,theloadd
readstheinitialvalueoflo ationx
, whi hlo ationisoverwritten(later!) bythestorea
.Wehavetheasso iatedwellformednesspredi ate
wf
,beingthe onjun tion ofthepredi atesforrf
→
andws
→
.Initial and nal states The write serialization provides a natural way to denetheinitialandnalstatesofanexe ution:
init X
, {w | ¬(∃w
′
, w
′
ws→ w)}
f inal X
, {w | ¬(∃w
′
, w
ws→ w
′
)}
a:W[x℄=1 b: R[y℄=0 po:0rf : W[y℄=1 fr 0:r1=0 d: R[x℄=0 po:1 rf fr 1:r2=0 rf rf a: W[x℄=1 b: R[y℄=0 po:0 d:R[x℄=1 rf rf : W[y℄=1 fr 0:r1=0 po:1 rf 1:r2=1 rf a:W[x℄=1 b: R[y℄=1 po:0 rf 0:r1=1 :W[y℄=1 rf d: R[x℄=0 po:1 rf fr 1:r2=0 rf a: W[x℄=1 b:R[y℄=1 po:0 d:R[x℄=1 rf rf 0:r1=1 : W[y℄=1 rf po:1rf 1:r2=1
Figure 3: Exe utionwitnessesfori3.
2.1.3 Ar hite tures
Wedene herewhatwe onsider tobeanar hite ture.
Preserved program order We assume a fun tion
ppo
, whi h gathers all pairsofeventsthatarenotto bereorderedwithrespe tto theprogramorder po→
. Considerforexamplethetesti3,depi tedat g. 1: thespe iedout ome wouldbevalidonlyifwritesandreadstodierentlo ations ouldbereordered. Thus, an ar hite ture that would authorise the spe ied out ome would not in ludewrite-readpairsin itspreservedprogramorder.Wewillnote ppo
→
fortherelationoutputedbythisfun tion onagivenevent stru tureE
, whi h is to be in luded inpo
→
. This relation is to be onsidered global,thatis,allpro essorsmustbehavewithrespe ttothe onstraintsindu ed byit.Globalityofrelations Asstatedintheintrodu tion,we onsiderwritestobe non-atomi ,thatis,notne essarilyavailabletoallpartsofthememorysystem aton e. Thus,thebehaviourof allpro essorsmustnotne essarilyin ludethe onstraints indu ed by
rf
→
relations. However, we distinguish the onstraints indu edby internalrf
→
rf
→
relation on asamepro essorand external rf→
from one pro essorto another. Thus, we split therf
→
relation into r→
, whi h representstheeventsinrf
→
onthesamepro essor,and rfe→
,whi hrepresentsthe eventsin rf→
ondierentpro essors:w
r→ r , w
rf→ r ∧ proc w = proc r
w
rfe→ r , w
rf→ r ∧ proc w 6= proc r
Relationsindu edbythe presen eof barriers Weassumegivena fun -tion
ab
, whi h, provided an event stru tureE
and an exe ution witnessX
, denestherelationovereventsindu edbythepresen eofabarrierinbetween inpo
→
twoinstru tions:ab
: E → X → rln E
where
E
(resp.X
)isthetypeofeventstru tures(resp. exe utionwitnesses). Theseinformationsarewhatdenesforusanar hite ture,depi tedbyA
: Denition1 (Ar hite ture)A ,
(ppo, int, ext, ab)
2.1.4 Validity ofan exe utionwith respe t to anar hite ture Wedeneherewhatitmeansforanexe utionwitness
X
tobevalidonagiven ar hite tureA
.Unipro essorbehaviour Somedo umentations[3℄ laimthatasole pro es-sorissupposedtorespe tthesequential exe utionmodel,thatis:
themodel ofprogramexe utionin whi hthepro essorappears to exe ute one instru tion at a time, ompleting ea h instru tion beforebeginningtoexe utethenextinstru tion
FollowingAlpha[10℄,wedenethepro essorissueorder,depi tedbythe pio
→
relation,asfollows:e
1
pio→ e
2
, e
1
po→ e
2
∧ loc e
1
= loc e
2
We all hb→
theunionofthethreerelations rf→,
ws→
and fr→
: hb→ ,
rf→ ∪
ws→ ∪
fr→
Noti e that hb→
is not the proper happens-before relation in the general ase, but ratherthehappens-beforeofamemorywithmulti- opy-atomi writes. We denethegeneralhappens-beforerelationinthenextse tion.To provide our exe utions the guarantee that they respe t the sequential exe ution model,werequirethatall therelations
rf
→
, ws→
and fr→
are onsistent withthepro essorissueorder,that is:uniproc , acyclic
(
hb→ ∪
pio→)
Figure4givesanexampleofanout omethatisforbiddenbe auseof
uniproc
. Therearetwoexe utionsforthisout ome,withdierentwriteserializations:a
ws→ b
onthe left, andb
ws→ a
onthe right. Informer ase,wehavec
fr→ b
(bya
ws→ b
anda
rf→ c
). Thus, invalidation follows from y leb
pio→ c
fr→ b
. In the latter ase,the y le isa
rf
→ c
pio→ d
fr→ a
,thelaststepfollowingfromb
ws→ a
andb
rfP
0
P
1
(a)
x←
1(b)
x←
2(c)
r2←
x(d)
r3←
x Forbidden: 1:r2=1;1:r3=2; a:W[x℄=1 b:W[x℄=2 ws :R[x℄=1 rf po:1po-lo d:R[x℄=2 rf rf fr po:1po-lo 1:r2=1 1:r3=2 a:W[x℄=1 :R[x℄=1 rf rf b:W[x℄=2 ws po:1po-lo d:R[x℄=2 rf po:1 po-lo 1:r2=1 fr 1:r3=2Figure4: Invalidexe utionsby
uniproc
.Alltogether We all ghb
→
theunionoftherelationsthatareglobal: ghb→ ,
ppo→ ∪
ws→ ∪
fr→ ∪
rf?→ ∪
ab→
with rf?→ ,
r?→ ∪
rfe?→
where r?→
(resp. rfe?→
)is r→
(resp. rfe→
)ifint
(resp.ext
) istrue
,theemptyrelationotherwise.We annowdene what avalid exe utionis, with respe t to an ar hite -ture
A
:Denition2(Valid exe ution)
A.valid E X
, wf ∧ uniproc ∧ acyclic (
ghb→)
Weak Memory Models Let
W
bethe typeof memorymodels, dened as follows:W , E → X → {⊤, ⊥}
Thus,wedenedafun tion
W mm
A
beingthetypeofar hite tures,whi h produ esaweakmemorymodelindu edbyA
:Denition3(Weak MemoryModel)
W mm
:
A → W
W mm(A) ,
∀ E X, A.valid E X
Hen eforth,wewillnote
AW mm
forW mm(A)
. 2.1.5 ExamplesWewill showhow to produ e aparti ular model from ourgeneri framework ontwo lassi almemorymodels, SequentialConsisten y [17℄, lateronreferred to as
SC
andT SO
[1℄, thus illustrating the on epts we used to dene our framework. Wewillshowat se tion2.3thatthese denitionsareequivalentto thenativeones.Sequential onsisten y
SC
hasbeendenedbyLamportasfollows: The resultof any exe ution is the sameas if the operationsof allthe pro essorswere exe uted in some sequentialorder, and the operations of ea h individual pro essor appear in this sequen e in theorderspe iedbyitsprogram. [17℄We give here a formal denition of an
SC
exe ution. We need at rst a sequentialexe utionex
→
,thatisatotalorder onsistentwiththeprogramorder:seq
ex→ , total order
ex→ E ∧
po→⊆
ex→
Weneedtohighlighttheimpli itexe utionmodel,whi hstatesthataread
r
readsfromthemostre entwritethatisbeforeitinex
→
. Letusnotepw
o
(r)
theset ofpreviouswritesforr
inapartialordero
todenetherf
→
relationfor anSC
exe utionthatis, whi hread readsfromwhi h write:SC.rf
ex→
, {(w, r) | w
prf→ r ∧ w = max pw
ex→
(r)}
Thus a valid
SC
exe ution will begivenby asequentialexe ution ex→
and the al ulationofitsindu edrf
→
relationasabove.Fromsu h anexe ution,we anprodu eanexe utionwitness:
SC.ws
ex→
, {(w
1
, w
2
) | w
1
ex→ w
2
∧ w
1
pws→ w
2
)}
SC.wit
ex→
, (SC.rf
ex→ , SC.ws
ex→ )
Weproposehereanalternativenotionof
SC
,whi hwewillshowequivalent tothenativeonein or.3:Sc.Arch ,
(
po→, true, true,
ab→)
Sc.W mm , W mm(Sc.Arch)
TSO TodesignaproperTSOexe ution,weneedto requirewhattheSpar do umentation[1℄spe ies:
R
∗
, {(r, e) | r
po→ e}
W W
, {(w
1
, w
2
) | w
1
po→ w
2
}
ptso
ex→
,
partial order
ex→ E ∧
R
∗ ⊆
ex→ ∧
W W
⊆
ex→ ∧
∃
tso→,
tso→⊆
ex→ ∧
total order
tsoMoreover, we need to highlight the expli it exe ution model, provided by the
V al
axiominthedo umentation:V al(L
a
) = V al(max
ex→
{S
a
| S
a
ex→ L
a
∨ S
a
po→ L
a
})
whi hstatesthataread
r
readsfromthemostre entwritethatisbeforeit inex
→ ∪
po→
. Thuswedenethe rf→
relationforaT SO
exe utionthatis,whi h readreadsfromwhi hwrite:T SO.rf
ex→
, {(w, r) | w
prf→ r ∧ w = max(pw
(
ex→∪
po→
)
(r))}
Asin the
SC
ase,weprodu eanexe utionwitness:T SO.ws
ex→
, {(w
1
, w
2
) | w
1
pws→ w
2
∧ w
1
ex→ w
2
}
T SO.wit
ex→
, (T SO.rf
ex→ , T SO.ws
ex→ )
Weproposehereanalternativenotionof
T SO
,whi hwewillshowequivalent tothenativeonein or.4:ppo
_tso
, R ∗ ∪ W W
T so.Arch
, (ppo
_tso, f alse, true, ab)
T so.W mm
, W mm(T so.Arch)
The ppo
→
is quite lear from the do umentation. The Val axiom indi ates that the internalrf
→
are not in luded in ex→
, whereas the external are, asthe writefromwhi hareadreadsisthemax
ofitspreviouswritesinex
→
. Thuswe onsider rfe→
tobeglobal, whereas r→
arenot. 2.2 Comparison of ar hite turesFrom ourdenition of ar hite ture arises avery simplenotionof omparison; wedenethepredi ateweaker amongar hite turesasfollows:
Denition4(Weaker)
A
1
≤ A
2
, ppo
1
⊆ ppo
2
∧
int
1
→ int
2
∧ ext
1
→ ext
2
∧
ab
1
⊆ ab
2
Theorem1(Validityis de reasing)
∀A
1
A
2
, A
1
≤ A
2
⇒
∀EX, A
2
.valid E X
→ A
1
.valid E X
Proof[in Coq℄ From
A
1
≤ A
2
, we haveA
1
.ghb
⊆ A
2
.ghb
, thus ifA
2
.ghb
is2.2.1 Makingvaliditymonotonous
Wedeneherea riterionto he kifanexe ution
X
runningonanar hite tureA
1
would bevalidonastrongerar hite tureA
2
:A
1
.check
A
2
, acyclic (A
2
.ghb)
Weshow that this riterion hara terisesan exe utionrunning on
A
1
that wouldbevalidonA
2
:Theorem2(Chara terisation)
∀A
1
A
2
, A
1
≤ A
2
⇒
∀EX, A
1
.valid E X
∧ A
1
.check
A
2
E X
↔ A
2
.valid E X
Proof[inCoq℄
⇒ X
being valid onA
1
, we have all requirements well formedness and unipro toguaranteeitisvalidonA
2
,ex eptthelast predi ate,whi h holdsbythehypothesischeck
A
2
.⇐ X
being valid onA
2
gives us all requirements well formedness and unipro toguaranteeitsvalidityonA
1
ex eptthelastone. AsA
1
≤ A
2
, we know thatA
1
.ghb
⊆ A
2
.ghb
(lemma ghb_in l), thus the a y li ity requirementforA
1
.ghb
holdsifA
2
.ghb
isa y li . 2.2.2 ExamplesS Inthe ontextofourgeneri framework,wedesigneda riteriontode ide if a parti ular exe ution
X
, with respe t to an event stru tureE
and on an ar hite tureA
,isS :A.check
Sc
, acyclic (
po
→ ∪
hb→)
This riterion hara terisesvalidweakexe utionsthat areS : Corollary 1(S hara terisation)
∀AEX, A ≤ Sc, A.valid E X ∧ A.check
Sc
E X
↔ Sc.valid E X
Proof[inCoq℄
⇒
Aspo
→ ∪
hb→= Sc.ghb
,thisisadire t onsequen eofthm. 2.⇐
asA
≤ Sc
,thisisadire t onsequen eofthm. 1. Thisresultallowsustoseethattheout ome0:r1=0;1:r2=0fori3(leftmost pi tureingure3)willnevershowuponasequentially onsistentma hine. All otherexe utionsdepi teding. 3areSC
bythesameargument.Tso Inthe ontextofourgeneri framework,wedesigneda riteriontode ide if a parti ular exe ution
X
, with respe t to an event stru tureE
and on an ar hite tureA
,isTso ; onsider hb_tso→
to be ws→ ∪
fr→ ∪
rfe→
:A.check
T so
, acyclic (
ppo_tso→
∪
hb_tso→ )
This riterion hara terisesvalidweakexe utionsthatareTso : Corollary2(Tso hara terisation)
∀AEX, A ≤ T so, A.valid E X ∧ A.check
T so
E X
↔ T so.valid E X
Proof[inCoq℄
⇒
Asppo_tso
→
∪
hb_tso→ = T so.ghb
,thisisadire t onsequen eofthm.2.⇐
asA
≤ T so
,thisisadire t onsequen eofthm.1.This result allows us to on ludethat all the out omes for i3spe ied in g.3mayshowupona
T so
ma hine.2.3 Equivalen e with native models 2.3.1 S isSC
Weshowthat theSCdenitionfrom[17℄is equivalenttoourdenition: Theorem3(S isSC)
∀EX, Sc.valid E X ↔ ∃
ex→, seq
ex→ ∧ SC.wit
ex→ = X
Proof[inCoq℄⇒
fromX
beingvalidonSc
,wehaveacyclic
(
ghb→ )
,whi hmeansacyclic
(
hb→
∪
po→)
onSc
. Weknowby or.1this onditionisne essaryand su ient toobtainanequivalentSC
exe ution.⇐
from thesequentialexe ution ex→
,weprodu eaSC.wit
whi h isvalidon anyweakerar hite turebythm.1. 2.3.2 Tso isTSOWeshowthat theTSOdenition from[1℄isequivalenttoourdenition: Theorem4(Tso is TSO )
∀EX, T so.valid E X ↔ ∃
ex→, ptso
ex→ ∧ T SO.wit
ex→ = X
Proof[inCoq℄⇒
fromX
being valid onT so
, we knowX
satisescheck
T so
by or. 2.check
T so
givesusana y li relation,thereforeapartialorder onE
, su h thatitsrestri tiontoW
isthetotalorderonstoresrequiredbyT SO
. AsT so.ghb
in ludesR∗
andW W
by onstru tion,wehavethenal require-mentstoprovideanexe utionvalidonT SO
.⇐
from ex→
,weprodu eaT SO.wit
whi hisvalidonanyar hite tureweaker2.4 Testing
In this se tion we pre isely dene our testing methodology and des ribe our tools.
2.4.1 Tools
litmus To understand the memory model provided by a given ma hine
M
, weuse litmus tests, whi h are assembly programs, with spe ied initial state of memory and registers. Torun them ona ma hine, we use ourlitmus tool, whi hrunsaC
skeletonintowhi hthelitmustestisen apsulated. Foragiven testtrunningonM
,we olle tthenal ontentofmemoryandregisters,thus dening asetofobserved out omesO
M
(
t)
.memevents To omparethememorymodelasobservedonama hineandour theorite alone,weimplementedourgeneri frameworkin thememeventstool, writtenin OCaml. Themainmoduleaxiomisanimplementationofthetheory presentedatse tion2: providedanar hite turemodule
A
su hasSc.Arch
orT so.Arch
, itoutputsallpossibleexe utionwitnesses (inthe absen eofloops) thatarevalidinthememorymodelW
indu edbyA
inthesenseofthevalid
predi atedened atse tion2.1.4,whi hf inal
dene theset ofvalid out omesV
W
(
t)
. When there are loops, it unfolds them several times, whi h gives a subsetofvalid exe utions,whi hhasbeenenoughforourpurposes. Moreover, memevents is abletooutput a ounter example: when aparti ular out omeis spe ied,itshowswhi h y lesintheghb
→
relationinvalidatethisexe ution. This givesaninsightonwhythisexe utionisnotallowedonaparti ularar hite ture, andifbarriersareneededornot.2.4.2 Comparisonof models
An additional tool, ompare, examines, for a given test t run on a ma hine
M
,thefollowing ases:O
M
(
t) ⊆ V
W
(
t)
,fromwhi hweknowourmodelisnot invalidated,andO
M
(
t) 6⊆ V
W
(
t)
,fromwhi hweknowourmodelisinvalidated. WhenO
M
(
t) ⊆ V
W
(
t)
,themost hallenging aseiswhentisinV
W
(
t)
yetnot inO
M
(
t)
,that isit hasan out omewhi his valid yet notobserved. Several reasonsexplainthis situation: either thasnot beenrun enoughto observeit, orthetested ma hine doesnotimplement thefeature highlightedbythetest. Inthat asethemodelistoopermissivewithrespe ttothisma hine. However, wedonotseektheadequationofO
M
(
t)
andV
W
(
t)
: doingsowouldleadusto parti ulariseourmodelsothatitrendersthemodelofthetestedma hine. Aswe wanttogiveamodelofanar hite ture,weshouldonthe ontrarydenealooser modelwhi h in ludes the observed out omesof anyma hine that implements thear hite ture.Tobemorepre ise, givenan ar hite ture
A
, amodelW
= W mm(A)
and animplementationM
ofA
,wedenetworequirementsthatmustsatisfyW
to bevalid anda urate withrespe ttoM
:Denition5 (Validity and a ura y of a model)
valid W ,
∀M, ∀
t,
O
M
(
t) ⊆ V
W
(
t)
accurate
M
W ,
∀
t,
V
W
(
t) ⊆ O
M
(
t)
observed never observed
i
5
int=false int=truei
6
ext=false ext=truei
3
WR6⊆
ppo WR⊆
ppoi
4
WW6⊆
ppo WW⊆
ppoi
1
RW6⊆
ppo RW⊆
ppoi
2
RR6⊆
ppo RR⊆
ppo Figure 5: Summaryof hara teristi testsi1:RWrelaxation a:R[x℄=1 b:W[y℄=1 ppo?po:0 :W[y℄=2 ws ab d:W[x℄=1 po:1 fen ed rf rfe rf i2:RRrelaxation a:W[x℄=1 b:W[y℄=1 po:0 fen ed :R[y℄=1 ab rf rfe rf d:R[x℄=0 ppo?po:1 fr rf i3:WRrelaxation a:W[x℄=1 b:R[y℄=0 po:0 ppo? rf :W[y℄=1 fr d:R[x℄=0 po:1 ppo? rf fr rf rf i4:WWrelaxation a:W[x℄=1 b:W[y℄=2 ppo?po:0rf :W[y℄=3 ws d:W[x℄=4 po:1 ppo? rf ws i5:rrelaxation a:W[x℄=1 b:R[x℄=1 global? r po:0 rf :R[y℄=0 ll po:2 d:W[y℄=1 fr e:R[y℄=1 r po:1 global? rf f:R[x℄=0 ll po:1 fr rf rf i6:rferelaxation a:W[x℄=1 b:R[x℄=1 global?rferf :W[y℄=1 ls po:1 d:R[y℄=1 global?rferf e:R[x℄=0 ll po:2 fr rf
Figure 6: Chara teristi testsexhibitingrelaxations
Thus we requireour model to be
valid
but not ne essarilyaccurate
with respe ttoallitsimplementations.2.4.3 Chara teristi tests
Wepresentherethekeyteststounderstandhowtoinstan iatetheparameters ofourmodel. Letusassumegivenama hine
M
that implementsamodelW
. Themain ideaistoobserveonerelaxationofthestoreatomi ityorordering onstraints at the time, by onsidering an exe ution where all relations in-volvedareglobal, ex epttheonein question. Thus,ifthespe iedout omeis observed,M
exhibits thisrelaxation,otherwisethere wouldhavebeena y le inthevalidity he kofthisexe ution,whi hwouldhavebeenforbidden.Weassumehereitisalwayspossibletomaintaina
R∗
pairinprogramorder, usingadependen ybetweenthetwoa esses. Thisdoesnotmeanwe onsider allR∗
pairsto be preservedin program order, but only theones that havea dependen ybetweenthem. MaintainedRR
(resp.RW
)pairswill bedepi ted byll
(resp.ls
).Globality ofrfmaps Internal rfmaps
P
0
P
1
(a)
x←
1(d)
y←
1(b)
r1←
x(e)
r3←
y ll ls(c)
r2←
y(f )
r4←
x i5r1 = r3 = 1 ∧ r2 = r4 = 0
?The test
i
5
anbe found in [6℄, with number2.4
: it is laimed to highlighta feature alled intra-pro essor forwarding, and illustrates the visibilityof store buering to theprogrammer. Ifthespe iedout omeof thistest isobserved, thenwe onsiderr
→
nottobeglobalonM
. Ifthespe iedout omenever shows up,thenr
→
anbe onsideredglobal. Indeed,asdepi tedatg. 6,ifinternal rf→
wereglobal,therewouldbeaboldrf
→
betweeneventsa
andb
W
[x]
andR[x]
fromP
0
andbetween eventsd
ande
W
[y]
andR[y]
fromP
1
. This would leadto a y lea
r→ b
ppo→ c
fr→ d
r→ e
ppo→ f
fr→ a
in this exe ution,whi hwould thereforebeforbidden. ExternalrfmapsP
0
P
1
P
2
(a)
x←
1(b)
r1←
x(d)
r2←
y ls ll(c)
y←
1(e)
r3←
x i6r1 = 1 ∧ r2 = 1 ∧ r3 = 0
?Thetest
i
6
also anbefoundin[6℄,withnumber2.6
,orinthelitteratureunder thenameW RC
[13℄, and nally in thePowerdo umentation with nameisa1
[3℄. If the spe ied out ome of this test is observed, then we onsiderrfe
→
not to be global onM
. Ifthe spe iedout ome never shows up, thenrfe
→
anbe onsideredglobal. Indeed,asdepi tedatg. 6,ifexternalrf
→
wereglobal,there wouldbeaboldrf
→
betweeneventsa
andb
W
[x]
fromP
0
andR[x]
fromP
1
andbetweeneventsc
andd
W
[y]
fromP
1
andR[y]
fromP
2
. Thiswouldlead to a y lea
rfe→ b
ppo→ c
rfe→ d
ppo→ e
fr→ a
inthis exe ution, whi h would therefore beforbidden.Whileobservingany rf
→
nottobeglobalonama hine,asthisistheweakest onditiononrf
→
, oneshould assumethatanyma hine that implementsW
has the orrespondingparametertof alse
,otherwiseW
ouldbeinvalidated. Preserved program orderW R
pairs The testi
3
, whi h is depi ted at g. 1, anbe found in [6℄, with number2.3a
. If the spe ied out ome is observed, thenW R
pairs are notpreservedinprogramorder: theppo
parameterof thisma hine shouldnot in ludeW R
pairs. If the spe iedout ome never shows up, then theppo
of this ma hine in ludesW R
. Indeed, as depi ted at g. 6, ifW R
pairs were global, there would beaboldppo
→
betweeneventsa
andb
W
[x]
andR[y]
onP
0
andbetweenc
andd
W
[y]
andR[x]
onP
1
. This would leadto a y lea
ppo→ b
fr→ c
ppo→ d
frW W
pairsP
0
P
1
(a)
x←
1(c)
y←
3(b)
y←
2(d)
x←
4 i4x
= 1 ∧ y = 3
?If the spe ied out ome of the test
i
4
is observed, thenW W
pairs are not maintainedinprogramorder. Ifitnever showsup,thentheppo
ofthisma hine in ludesW W
. Indeed, asdepi ted at g. 6, ifW W
pairs were global, there wouldbeaboldppo
→
betweeneventsa
andb
W
[x]
andW
[y]
onP
0
andbetweenc
andd
W
[y]
andW
[x]
onP
1
. Thiswouldleadtoa y lea
ppo
→ b
ws→ c
ppo→ d
ws→ a
inthisexe ution,whi hwould thereforebeforbidden.RW
pairsP
0
P
1
(a)
r1←
x(c)
y←
2(b)
y←
1 fen e(d)
x←
1 i1r1 = 1 ∧ y = 2
?Ifthespe iedout omeofthetest
i
1
isobserved,thenRW
pairsarenot main-tained in program order. If it never shows up, then theppo
of this ma hine in ludesRW
. Indeed, as depi ted at g. 6, ifRW
pairs were global, there would be a boldppo
→
between eventsa
andb
R[x]
andW
[y]
onP
0
. If the barrieronP
1
is B- umulative,it orderseventsc
andd
but alsoc
anda
. This wouldleadto a y lea
ppo
→ b
ws→ c
ab→ a
inthis exe ution,whi h would therefore beforbidden.RR
pairsP
0
P
1
(a)
x←
1(c)
r3←
y fen e(b)
y←
1(d)
r4←
x i2r3 = 1 ∧ r4 = 0
?Thetest
i
2
an befound in [6℄, withnumber2.1
. IfM
hasglobal externalrf
, andpreservesW W
pairs,and thespe ied out omeisobserved, thenRR
pairsarenotpreservedinppo
. Ifthespe iedout omeisnever observedunder thesamehypothesis, thenppo
in ludesRR
pairs.If
M
doesnothaveglobalexternalrf
,we an onsiderthesametest mod-ied so that a B- umulative barrier is between the instru tions onP
0
: the B- umulativityenfor es aglobal ordering between(a)W [x]
onP
0
and(c)R[y]
onP
1
. Ifthespe iedout omeisobserved,we an on ludethatRR
pairsare notinppo
: otherwise, there would be aboldppo
→
betweenc
andd
R[y]
andR[x]
onP
1
whi h would lead to a y lea
ab
→ c
ppo→ d
fr→ a
in the exe ution, thereforeforbidden.3 Semanti s of barriers 3.1 Barriers guarantee
Letus onsidertwoar hite tures
A
1
≤ A
2
. Weexamineherewhatthebarriers providedbyA
1
should guaranteetorestoreA
2
.We note rf
2\1
→
for rf2
?→ \
rf1
?→
. We dene the predi ateA
1
.f b
for fully barriered onA
1
asfollows: ab1
→ = (
rf2\1
→ )?;
ppo2
→ ; (
rf2\1
→ )?
Weshow that thisa su ient onditiononthe barriersprovided by
A
1
to restoreanexe utionvalidonA
2
:Theorem5(Barriers guarantee)
∀A
1
A
2
, A
1
≤ A
2
⇒
∀EX, A
1
.valid E X
∧ A
1
.f b E X
⇒ A
2
.valid E X
Proof[in Coq℄ Suppose that it is not valid on
A
2
: thus we have a y le inA
2
.ghb
,thatisinws
→ ∪
fr→ ∪
rf2
?→ ∪
ppo2
→
. Su ha y leisa y lein ws→ ∪
fr→ ∪
rf1
?→
∪
ppo1
→ ∪(
rf2\1
→ )?∪
ppo2
→ ∪(
rf2\1
→ )?
whi h impliesa y le in ws→ ∪
fr→ ∪
rf1
?→ ∪
ppo1
→
∪
ab1
→
thatis,A
1
.ghb
. Thus we ontradi tthevalidityofX
onA
1
. Thisresultprovidesaninsightonwhatpowershouldhaveabarrierprovided byanar hite tureA
1
e.g. PowerPCtorestore astrongermodele.g.SC
. First,thebarriershouldrestorethepairsthatarepreservedinprogramorderonA
2
but notonA
1
. Se ond,thebarriershould ompensatethela kofrelations between writes and reads events whi h we model byrf
→
not being a global relation in the general ase. Thus, if therf
→
relation is not global onA
1
but globalonA
2
,weover omethela kofglobalityofrf
→
byorderingthebeginning withtheendofthe hain. Thisishowweinterpretthe umulativity ofbarriers as stated in the PowerPC do umentation [3℄. We interpret furthermore the A- umulativity (resp. B- umulativity) property, as applying to barriers that enfor eorderingofpairsinrf
→;
po→
(resp. po→;
rf→
). We onsiderabarrierthatonly preservespairsinpo
→
to benon umulative.Provided abarrier that hassu h power, we also havean insight on where to pla e these barriers in the ode: the statement of the theorem indi ates indeedthatbarriersshouldbeinbetweenanypairsin
ppo
2
→
su hthatoneofthe omponent of this pair(or both) may give rise to arf
→
relation that is to be globalonthestrongerar hite tureA
2
butisnotonA
1
.From any ar hite ture to
Sc
We designed semanti s for a barrier that, for any weak memory model indu ed by an ar hite tureA
, would su e to restablishSc
.e
1
fen ed→ e
2
, ∃b, e
1
po→ b ∧ b
po→ e
2
fen ed
→ =
po→
(pla ement)e
1
ab→ e
2
,
e
1
fen ed→ e
2
(base)∨ e
1
rf→ r ∧ r
ab→ e
2
(A- umulativity )∨ e
1
ab→ w ∧ w
rf→ e
2
(B- umulativity )Thisbarrierordersallpairsin po
→
asindi atesthebase ase;italso ompen-satestheeventualla kofvisibilityofrf
→
onA
byorderingthetwoendsofa hain rf→;
po→
(resp. po→
; rf→
) as indi ates the A- umulativity (resp. B- umulativity) ase.3.2 Considering a weaker guarantee
Wesaidthebarriershouldatrstrestorethepairsthatarepreservedinprogram orderonthestrongerar hite ture. Thus,forthesimple asewherenoneofthe omponent ofapair in
ppo
→
givesriseto a rf→
relationthat isglobal onA
2
but notonA
1
, there isno need fora barrieraspowerfulasabove: abarrier that onlyorders theevents that surroundit stati ally would be enough. Consider thewf b
predi ate:wf b ,
ppo
2
→ \
ppo1
→
, the following result arises asa natural orollaryofthm. 5.Corollary3(Non umulative barriersguarantee)
∀A
1
A
2
, A
1
≤ A
2
∧
rfe
1
?→ =
rfe2
?→ ⇒
∀EX, A
1
.valid E X
∧ A
1
.wf b E X
⇒ A
2
.valid E X
From
T so
toSc
As rfe→
are onsidered global in bothT so
andSc
, weonly needanon umulativebarrierto restoreSc
fromT so
. We dene thepairsof writesand readsinpo
→
asfollows:W R ,
{(w, r) | w
po→ r}
. TorestoreSc
fromT so
,weneedtopreservetheW R
pairs,astheyarepreservedonSc
butnotonT so
. Asinternalrf
→
areW R
pairs,su habarrierwould ompensatethela kof visibilityofinternalrf
→
onT so
aswell. Thereforewedenethefollowingbarrier semanti s: fen ed→ =
W R
(pla ement)e
1
ab→ e
2
,
e
1
fen ed→ e
2
(base)We dene thepredi ate
T so.wf b
asfollows:T so.wf b , T so.ab
= W R
, andthefollowingtheoremarisesasanatural onsequen eof or.3:Theorem6(Barriers pla ementon
T so
)Proof[inCoq℄From
X
beingwf b
onT so
,wehaveT so.ghb
=
ws→ ∪
fr→ ∪
ppo_tso→
∪
rfe→ ∪W R
, whi h is a y li sin eX
is valid onT so
. AsW R
oversboth r→
andW R
pairsthat arenotinrf
→
,wegetthea y li ityofSc.ghb
dire tly.Fromanyar hite ture
A
≤ T so
toT so
Wedesignedsemanti sforabarrier that, for any weak memorymodel indu ed byan ar hite tureA
weakerthanT so
,wouldrestablishT so
: fen ed→ =
ppo_tso→
(pla ement)e
1
ab→ e
2
,
e
1
fen ed→ e
2
(base)∨ e
1
rfe→ r ∧ r
ab→ e
2
(A- umulativity )∨ e
1
ab→ w ∧ w
rfe→ e
2
(B- umulativity ) Here, all pairs ex eptW R
are to be preserved in program order, whi h is depi ted by the pla ement onditionfen ed
→ =
ppo_tso→
. As internal rf→
are not onsidered global inT so
, there is no need to ompensate them: the ordering powerof the barrier on erns onlytheexternalrf
→
, asdepi tedby theA- and B- umulativity ase.Weretrievewhat is des ribed in theSpar V9do umentation [2℄:
T SO
is indeed obtained fromP SO
, whi h is obtainedfromRM O
by barriers pla e-ments. Inourframework,wedeneRmo
andP so
asfollowswhereR∗
l
(resp.W W
l
)representsallpairsofR∗
(resp.W W
)tothesamelo ation:Rmo.Arch
, (R∗
l
∪ W W
l
, f alse, true, ab1)
P so.Arch
, (R ∗ ∪ W W
l
, f alse, true, ab2)
As for
T so
, we dedu e from the Val axiom that external rf→
are global, whereasinternalarenot.Thedo umentation spe ies that
P SO
is obtainedfromRM O
by addingLoadLoad
andLoadStore
barriers after ea h read. This statement has two onsequen esinourframework:rst,R∗
pairsarepreservedinP so
,andse ond, torestoreP so
fromRmo
,sin etheexternalrf
→
arealreadyglobalonRmo
,one should useanon umulativebarrierthatpreservesR∗
pairs.T SO
isobtainedfromP SO
byaddingStoreStore
barriersafterea hwrite. We on lude thatW W
pairs are preserved inT so
, and that to restoreT so
fromP so
, oneshould use anon umulativebarrier that preservesW W
pairs. Thus,fromRmo
toT so
,anon umulativebarrierthatpreservesR∗
andW W
isneeded,asstatedbythe or.3.4 Case study: a Power model
Inthis se tionwedene anar hite tureforPower,thatis, we denerelations ppo
→
and abmodel;werather onfrontatentativePowerPCmodel againsta tualPowerPC ma hines.
4.1 Complete event stru tures and exe ution witnesses Hen eforth, we will reason on omplete event stru tures, on whi h se tion 2 abstra t. We shall avoid exhaustive treatment of omplete event stru tures, interestedreadersmayreferto[18,9℄;instead,wesket hthemainideas. Additional events In addition to memory events, the exe ution of an in-stru tionmaygenerateavarietyof events: mostinstru tionsgenerateregister events that rendera essestoregisters,memorybarriers instru tionsgenerate barrierevents and onditionalbran hinstru tionsgenerate ommitevents that expressbran hingde isions. Wenote
B
thesetofbarriereventsandC
thesetof ommits,b
andc
beingtypi alelements. Weshallhandlethreememorybarrier instru tions: isyn , syn and lwsyn . The orrespondingevents are distin-guishedbypredi atesis-isyn ,et . Asin previousse tions,westilldenotethe set ofmemoryeventsbyE
(typi alelemente
),the setof memoryread events byR
(typi al elementr
), and the set of memory write events byW
(typi al elementw
).Extended or additional relations Weextend theprogram order relation po
→
toallevents. Inparti ular, po→
nowordersbothmemoryand barrierevents. Moreover, ompleteeventstru tures ompriseadditionalrelations,more spe if-i ally intra instru tion ausalityii o
→
that represents the ordering onstraints of eventswithin asame instru tion. Moreover, the following relationfen ed(k)
→
rendersthepresen eofabarrierofstylek
betweenmemoryeventse
1
ande
2
:e
1
fen ed(k
)→
, ∃b,
is-k(b) ∧ e
1
po→ b
po→ e
2
.
Exe ution witnessesalsobe omemore omplete,asrelation rf
→
nowrelates registerevents. Wenoterf-reg
→
thesubrelationof rf→
thatrelatesregisterstoresto registerloadsthat readtheirvalues. Asasidenote,relationrf-reg
→
derivesfrom sequential exe ution in a mu h strongersense thanrf
→
on memory. Namely,w
rf-reg→ r
whenw
ismaximalamongsttheprede essorsofr
in programorder.Illustration Figure 7showsaprogramfragmentand afragmentof the or-responding ompleteeventstru ture,togetherwithanexe utionwitness.
Therstinstru tionisanindexedloadfrommemorylwz: baseaddress
y
is takenfromregisterr5(whi hhas beenwriteninto elsewhere),index is0
,and thevalue read from theee tiveaddressy
+ 0
is stored into registerr2. We hereviewthreeevents,labelledongureasc
,a
andd
. Asanexampleof intra-instru tiondependen y,a
ii o
→ d
expresses that loadingfrom memorypre edes storingintoregisterr2.The ompareandbran hsequen eislesstrivial: theinstru tion mpwi r2,1 omparesthe ontentofr2tothe onstant1andstorestheresultof omparison (
2
means equality) into the ontrol register r0. The next instru tion is a onditionalbran hbne,withbran hing de ision( ommiteventh
) onditioned bythe ontentsof ontrolregister r0(g
ii o
lwzr2,0(r5) mpwir2,1 bneL1 stwr3,0(r6) a: R[y℄=1 d: W 1:r2=1 ii o h: Commit dd b: W[z℄=3 trl : R1:r5=y ii o e: R1:r2=1 rf-reg f: W1:CR0=2 ii o g: R1:CR0=2 rf-reg ii o po i: R1:r3=3 po j: R1:r6=z po ii o ii o rf rf-reg rf-reg rf-reg L1: lwz r2,0(r5) mpwi r2,1 bne L1 stw r3,0(r6)
equal. Thus, sin e ontrol registers signalsequality, the bran h is not taken andthenextinstru tiontobeexe utedisthestorestw,asshownbythearrows
h
po→ i
,h
po→ j
andh
po→ b
. 4.2 Globality of rfmapsRunningtests
i
5
andi
6
onaPowerma hineyieldsthespe iedout omes. Thus, we onsiderther
→
andrfe
→
relationsnottobeglobalforPower.4.3 Preserved program order
ppo
Somepartsofthepo
→
programorderrelationareree tedintheglobal happens-beforerelation. Inthisse tionwepi kthoseout,deninga preserved-program-orderrelationppo
→
.Data dependen ies Data dependen ies within a pro essor arise from any ombination of the reads-from relation on registers and the intra-instru tion ausalityrelation: dd
→ ,
(
rf-reg→ ∪
ii o→)
+
(hereR
+
denotesthetransitive losure of
R
). Note thatthis relationin ludes no dependen iesviamemory.Therestri tionoftheabovetomemoryeventsiswritten dd-mem
→
: dd-mem→
,
dd→ ∩ (M × M)
Control dependen ies A memorywrite is ontrol dependent ona memory read if there is an intervening ommit (of a onditional bran h) that is data-dependentontheread andpre edes(in programorder)thewrite:
r
trl→ w
, ∃c ∈ C. r
dd→ c
po→ w
Thisrelationmodelsthatfa t thatmemorywritesarenotspe ulated,whereas reads anbe.
Isyn dependen ies Amemoryeventisisyn -dependentonamemoryread if there exists an intervening ommit (of a onditional bran h) that is data-dependent on the read and is separated (in program order) by an isyn from theevent:
r
isyn→ m
, ∃c ∈ C. r
dd→ c ∧ c
fen ed(isyn )→
m
Notethat thisonlyaddsanythingbeyond ontroldependen iesinthe aseofa memoryread/readpair.
Figure 7givesanexample of a ontrol dependen y, from load
a
to storeb
through ommith
.Alltogether Thepreservedprogramorderrelationisjusttheunionofdata dependen yformemoryevents, ontroldependen ies,andisyn dependen ies:
ppo
→
,
dd-mem→
∪
trl→ ∪
isyn→
Notethatmemorystoresareneverthesour eofa ppo
→
pair,asa onsequen e of the instru tion semanti s: a memory store annot be the sour e of aii o
→
pair,norofarf-reg
→
pair. Thus,anaturalpartitionof ppo→
pairsisintoload/load pairsandload/storepairs. Wein ludethesepairsintheglobalhappens-before relation: forload-loadpairs: wereferto [5,pp. 653668℄,whi hstatesthatinsu h asituation, load
r
1
will be performed before loadr
2
withrespe t toany pro essor,whi hweinterpretasr
1
beinggloballyperformedbeforer
2
. It isnot leartous whether ornotthese notionsareequivalentin the ase ofaload,orifourinterpretationmakessensew.r.tar hite turalinsights; for load-store pairs: wededu e fromr
ppo
→ w
thatr
happens before the storeisinitiated. Sin ewe onsiderloadstobeatomi thus onsidering theyaregloballyperformedassoonastheyareinitiatedand storesto beinitiatedbeforebeinggloballyperformed,wededu ethatr
isglobally performedbeforew
is, and in lude su hppo
→
pairsinthe global happens-beforerelation.4.4 Values do not ome out of thin air
Intheappli ationofourframeworkforPower,load-storepairsendurea parti u-lartreatment: weextendload-store
ppo
→
edgesbyfollowing rf→
ones,thusdening anotherrelationppo-ext
→
,whi hwewillalsoin ludeintheglobalhappens-before relation. Consideratripler
ppo→ w
rf→ r
′
,wherer
′
doesnotneedtooriginatefromthe same pro essoras
r
andw
. Herer
happens before thestore is initiated, andr
′
happensafter thestoreisinitiated. Thus,intuition suggeststhat
r
globally happensbeforer
′
. Wedenetheextension of ppo
→
by rf→
asfollows:r
ppo-ext→
r
′
, ∃w, r
ppo→ w
rf→ r
′
Ourextension of load-storedependen iesis ageneralization ofsome he k thatweakmodelsoftenandarguablymustin orporate:the ausality he k of [10℄, or the values do no not ome out of thin air of [7℄. The anoni al exampleofsu ha he kisasfollows:
P
0
P
1
(a)
r1←
x(c)
r1←
y(b)
y←
r1(d)
x←
r1Wefurtherassumethat
x
andy
initially ontain0
. Withoutspe i provisionin themodel,theabsurdout omex
= 1; y = 1
mightremainvalid,asdemontrated bythefollowingexe utionwitness:a: R [x℄=1 b: W [y℄=1 ppo-pro : R [y℄=1 rf y=1 d: W [x℄=1 ppo-pro rf x=1
Our model invalidatesthe exe utionthanks to ppo extension. Namely, we have
a
ppo-ext→
c
(bya
ppo-pro→
b
rf→ c
)andc
ppo-ext→
a
(byc
ppo-pro→
d
rf→ a
. Hen e, ppo→
aloneis y li . Afortiori ghb→
is y li ,sin e ppo→
isin ludedin ghb→
. Notethat we ouldpreventvaluesto omeoutofthin airbyaddinganothersanity he k ontherf
→
relationinourgeneri framework,followingAlpha'sdo umentation. Therearetworeasonsfor onsideringsu hanextensiontobeglobalthat is,in ludedintheglobalhappensbeforerelation: it rulesoutsomeexamples in whi h valueswould appear outof thin air aspresentedin thefollowingexample;
from a global time perspe tive,
r
andr
′
are ordered respe tivelybefore andafterthepointoftimewhen
w
isinitiated;formoredetails,seese tion 6where anexampleisdis ussed.This isperhapsintuitivelyplausible,anditsu es toruleoutsome exam-ples in whi h values would appear out of thin air (we suppose here that the ar hite tureshouldruleoutthin-airreads,thoughthatmightbedebated),and to orrespondwithourPower5experimentsonthetestpresentedinse tion 6. However, su h an extension does notseem to be for ed. Therefore, it is not learto uswhetherweshould forbidthat behaviourinthemodel.
Illustration We onsider here the litmus test adir1v3 (a variation on [7, Test 1℄).
Figure8showstheprogramandanon-
SC
exe ution( hb→ ∪
po→
is y li ). One may rstnoti e theppo
→
relation betweenloadb
and storec
. It followsfrom a datadependen y, sin etheee tiveaddressofc
isexa tlythevalueread byb
(i.e. theaddressoflo ationy
). Therelationghb
→
ishighlightedwithbla kbold arrows. Wehave:1.
b
ppo→ c
(data dependen y), and thusb
ppo-ext→
d
(byb
ppo→ c
rf→ d
and ppo-extension). 2.d
syn→ e
(syn instru tion), and thusc
abe
→
(byc
rf→ d
syn→ e
and A- umulativity).P
0
P
1
(a)
x←
&y(d)
r3←
y(b)
r6←
x syn(c)
*r6←
1(e)
r4←
x Initially: x=&z; Allowed: 1:r3=1;1:r4=&z; a:W [x℄=y b:R[x℄=y po:0rf rf :W[y℄=1 po:0 ppo d:R[y℄=1 ppo-ext:b- rf e:R[x℄=z A/B:d-e rf po:1 syn 1:r3=1 fr 1:r4=&z rfFigure8: Litmustest adir1v3
3.
e
fr→ a
,bydenition of fr→
. Clearly, ghb→
is a y li and the exe ution is valid. In some sense, validity followsfroma
rf
→ b
notbeingglobal sin ethere isa y lea
rf→ b
ghb→ a
. Note thatevenifwestrengthentheppo
→
relationbyitsextension,thisout omeisstill valid.This rf
→
relationisinternaltoapro essor;by onsideringitnottobeglobal, wemodelthepresen eof astorebueronthis pro essor,whi hwesupposeto beatleastapartofthereasonwhythis behaviourisobserved.4.5 Cumulative memory barriers
syn Thesyn barrieristhein arnationoftheSC-restoring umulativebarrier des ribedin se tion3.1,whi h denitionweexpandfor larity:
syn
→ ,
fen ed(syn )→
e
1
ab-syn→
e
2
,
e
1
syn→ e
2
(base)∨ e
1
rf→ r
syn→ e
2
(A- umulativity )∨ e
1
syn→ w
rf→ e
2
(B- umulativity )∨ e
1
rf→ r
syn→ w
rf→ e
2
(A/B- umulativity )lwsyn PowerPCfeaturesalightweight umulativebarrier,lwsyn ,whi h se-manti swedene asfollows:
lwsyn
→ ,
fen ed(lwsyn )→
∩ ((W × W) ∪ (R × E))
e
1
ab-lw→ e
2
,
e
1
lwsyn→ e
2
(base)∨ e
1
rf→ r ∧ r
ab-lw→ e
2
∧ e
2
∈ W
(A- umulativity )∨ e
1
ab-lw→ w ∧ w
rf→ e
2
∧ e
1
∈ R
(B- umulativity )Inotherwords,lwsyn a tsassyn ex eptonstore-loadpairs,theex eption impa tingboththebase and umulativity ase.
Finallywedenerelation ab
→
astheunionof ab-syn→
and ab-lw→
. Analogybetween ppo-ext→
and B- umulativity The ppo-ext→
extensionis ar-guablyananalogofB- umulativity. Herewe onje turethatbarriersimplement A- umulativity by waiting for somestores to beperformed globally, in whi h aseppo
→
ignorestheissue. By ontrast,B- umulativityonaload-store pair de-mandsnospe i a tions,asthenatural onsequen eofaw
rf
→ r
implyingthatr
isperformedonlyon ew
isissued.5 Barrier experiments 5.1 O ial tests
Aprogrammingnotein[3,p. 415℄des ribestwoexamplesinpre iseprose.We formulatethoseasinvalid exe utionsoflitmus tests.
isa1 Let us rst examinethe simplertest isa1, given aspseudo- odeat the topof gure9. PowerPCdo umentationstates: Cumulativeordering di tates that the value loaded from lo ation x by pro essor 2 is 1. Our interest is in ano iallyinvalidexe ution,whi hourPowermodelshouldalsodeeminvalid (bottomofgure). Thus,weinterprettheabovepres riptionasforbiddingthe valueloaded fromlo ationxbypro essor2tobe
0
, theinitial ontentsofx
.Torelateanexe utiongraphto alitmustest, onemayrstrelateeventsto instru tions,usingtheeventannotations(
(a)
,(b)
,...) inprogramtextandthe po→
arrowsingraphs. Forinstan e,P
1
performsaloadfromlo ationx,reading1, (eventb
),astoreof2tolo ationy(eventc
),andthoseareseparatedbyasyn instru tion(relationb
syn
→ c
). Otherarrowsare asfollows: dashedarrowsgive relationrf
→
, with pending arrowsto read events beingloads from intial state, and pending arrowsfrom write events being stores to nal state; while bold arrowsgive relationghb
→
. Forinstan e,e
fr→ a
resultsfrom evente
readingthe initial valueof lo ationx, whi h isoverwritten by eventa
. Or,a
ab
→ d
results frombarrier umulativitybya
rf
→ b
syn→ c
rf→ d
.We an now rea h interesting on lusions quite easily: by or. 1, the ex-e ution shown is not SC, sin e there is a y le
a
rf
→ b
po→ c
rf→ d
po→ e
fr→ a
. Moreimportant,theexe utionisnotvalidinthePowermodel,sin ethereare y lesinghb
isa1
P
0
P
1
P
2
(a)
x←
1(b)
r1←
x(d)
r1←
y syn syn(c)
y←
2(e)
r2←
x Forbidden: 1:r1=1;2:r1=2;2:r2=0; a: W [x℄=1 b: R [x℄=1 rf : W [y℄=2 A/B:b- d: R [y℄=2 A/B:b- rf po:1 syn A/B:b- 1:r1=1 rf e: R [x℄=0 A/B:d-e rf po:2 syn 2:r1=2 fr 2:r2=0 rfFigure9: O ialyinvalidexe utionof isa1
umulative ordering of storage a esses pre eeding a memory barrier. As a onsequen e,the y le
a
ab
→ c
ab→ e
fr→ a
is themostillustrative. Namely,a
ab→ c
followsfroma
rf→ b
andb
syn→ c
;whilec
ab→ e
followsc
rf→ d
andd
syn→ e
;isa2 Themore omplextest isa2(gure 10)isarenementof isa1: a hain ofstore-readsfrom
P
0
toP
2
thatpasses throughP
1
.But
P
0
nowperformstwostorestolo ationsx
andy
,separatedbyasyn ; whileP
1
loopsloadingy
untilsit readsthe value2
written toy
byP
0
, before storingvalue3
tolo ationz
.P
2
remainsessentiallyun hanged. Asforisa1,theisa2
P
0
P
1
P
2
(a)
x←
1 L1:(f )
r3←
z syn(c, d)
r2←
y syn(b)
y←
2 mp r2,2(g)
r1←
x bne L1(e)
z←
3 Forbidden: 2:r3=3;2:r1=0; a: W [x℄=1 b: W [y℄=2 po:0 syn d: R [y℄=2 A/B:a-b rf rf rf : R [y℄=0 fr po:1 e: W [z℄=3 ppo f: R [z℄=3 ppo-ext: -e ppopo:1 ppo-ext:d-e rf g: R [x℄=0 A/B:f-g rf syn po:2 2:r3=3 fr 2:r1=0 rf rfar hite ture spe i ation forbids that
P
2
loads value0
from lo ationx
(third instru tion)whenithasloaded(rstinstru tion)thevalue(here3
)storedbyP
1
in somememorylo ationused for ommuni ating(herez
). A keyobservation is theabsen e of abarrier inP
1
ode. Instead, wehavea ontrol dependen y. Theexamplebeingo ial,weassumethatsu h a ontroldependen ysu es topreventthelastloadofP
2
fromreadingvalue0
.Inpresen eofa onditionalbran h,thereisa leardistin tionbetween pro-gramtextandexe ution,or,morepre iselybetweenprogrammlistingorderand programorder
po
→
. Wesele taparti ular(invalid)exe utionwitnessgenerated by memevent, whereP
1
exe utes twoloopiterations (gure 10). The ontrol dependen yisexpressedasthetwoedgesc
ppo
→ e
andd
ppo→ e
. Theexe utionis non-SC,bytheexisten eof y lea
po
→ b
rf→ d
po→ e
rf→ f
po→ g
fr→ a
. Theexe ution is also invalidin ourPowermodel, sin eghb
→
is y li . We learlyidentify two y les:a
ab→ d
ppo→ e
ab→ g
fr→ a
anda
ab→ d
ppo-ext→
f
syn→ g
fr→ a
. Note that the ppo-ext→
extension isnot neededto on lude that this out omeis invalid in our Powermodel.5.2 Classi al tests
Inthepreviousse tion,wehavedemonstratedthatourPowermodelis orre t w.r.t. thetwoo iallitmusteststhat arepubli lyavailable. Clearly,twotests areunsu ientto drawany on lusionandweneedmore.
Some litmus tests are onventional, su h as iriw (Independant Reads of Independent Writes, gure11) and rw (Read ToWrite Causality, gure12) seeforinstan e[13℄, and[4,Example7.7℄.
Figures11 and12 shownon-SC exe utionwitnesses, whi h are theones of interest. Toseethattheexe utionswe onsiderarenon-SC,itsu estofollow
rf
→
,fr
→
andpo
→
arrowsinanygraph,soastonda y le. Thesegraphsalsoshow that theexe utions onsideredareinvalidourPowermodel,bythepresen eof boldghb
→
y les.We annot on lude from thepubli do umentation[5℄ whether these two testsareinvalidonthePowerar hite ture;thereforeweresortto experimenta-tion.
5.3 Experiments
In experiments, we observe a sele tion of the nal values of registers and of memorylo ation,yielding out omes. Inthe aseof thefourlitmus testsisa1 rw the nal values of registerswritten to by the load instru tionssu e to identify the non-SC exe ution depi ted. For instan e the out ome [1:r1=1; 1:r2=0; 2:r3=0;℄su esto identifythenon-SCexe utionof rw .
We performed experiments ontwo ma hines doko and hp x. doko is a 4- oresPower5ma hine,running Linux; while hp xisone16- oreseServer575, runningAIX.