• Aucun résultat trouvé

Fences in Weak Memory Models

N/A
N/A
Protected

Academic year: 2021

Partager "Fences in Weak Memory Models"

Copied!
43
0
0

Texte intégral

(1)

HAL Id: inria-00408568

https://hal.inria.fr/inria-00408568

Submitted on 3 Aug 2009

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Alglave Jade, Luc Maranget

To cite this version:

Alglave Jade, Luc Maranget. Fences in Weak Memory Models. [Research Report] RR-7010, INRIA.

2009. �inria-00408568�

(2)

a p p o r t

d e r e c h e r c h e

N

0

2

4

9

-6

3

9

9

IS

R

N

IN

R

IA

/R

R

--7

0

1

0

--F

R

+

E

N

G

Fences in Weak Memory Models

Jade Alglave — Luc Maranget

N° 7010

Juillet 2009

(3)
(4)

Centre de recherche INRIA Paris – Rocquencourt

Jade Alglave, Lu Maranget Thème:

Équipe-ProjetMos ova

Rapportdere her he n° 7010Juillet200939pages

Abstra t: Wepresenthereanaxiomati framework,implementedintheCoq proofassistant,fordeningweakmemorymodelsintermsofseveralparameters: lo al reorderingsofreadsand writes,andvisibilityofinter andintrapro essor ommuni ationsthroughmemory. Inthis ontext,weprovideformaldenition of weak memory modelsindu ed byar hite tures, illustratedby denitions of

SC

and Spar

T SO

. Moreover, we dene a omparison over ar hite tures, an ar hite ture

A

1

being weaker than another one

A

2

when

A

1

allows more behavioursthan

A

2

. Inaddition, we provide a hara terisation of behaviours allowedby

A

1

whi harealsovalidon

A

2

. Bythatmeans,weprovideasimple hara terisation of SC and TSO behaviours on any weaker ar hite ture. We alsoprovideanabstra tnotionofwhatshould bethea tionand pla ementof fen estorestoreagivenmodelfromaweakerone.

(5)

Résumé : Nous proposons unenvironnementgénérique, implémenté ausein del'assistantde preuveCoq,pourdénirdes modèlesdemémoire relâ hésen fon tionde plusieurs paramètres: réordonnan ementsde le tures et é ritures, etvisibilités des ommuni ationsvialamémoire. Dans e ontexte,nous four-nissonsunedénitionformelled'unmodèledemémoireinduitparune ar hite -ture,quenousillustronsparlesdénitionsde

SC

etSpar

T SO

. Parailleurs, nous dénissons une notion de omparaison de deux ar hite tures, une ar hi-te ture

A

1

étant onsidérée plus faible qu'une ar hite ture

A

2

si

A

1

autorise plusde omportementsque

A

2

. De plus,nous fournissons une ara térisation des omportementsautoriséspar

A

1

qui sontégalementvalides auseinde

A

2

, e qui nous permet de donner une ara térisation simple de

SC

et

T SO

sur desar hite turesplusfaibles. Nousfournissonségalementuneformalisationdu pouvoiretdupla ementdesbarrièresmémoirespourrestaurerunmodèledonné depuisunmodèleplusfaible.

(6)

Contents

1 Introdu tion 4

1.1 Anaxiomati generi model . . . 4

1.2 Studyofbarrierspower . . . 5

1.3 Casestudy: aPowermodel . . . 5

2 Des riptionofthe model 5 2.1 Axiomatisation . . . 5

2.1.1 Basi obje ts . . . 5

2.1.2 Exe utionwitnesses . . . 6

2.1.3 Ar hite tures . . . 8

2.1.4 Validityofanexe utionwithrespe ttoanar hite ture . 9 2.1.5 Examples . . . 10

2.2 Comparisonofar hite tures . . . 12

2.2.1 Makingvaliditymonotonous . . . 13

2.2.2 Examples . . . 13

2.3 Equivalen ewithnativemodels . . . 14

2.3.1 S isSC . . . 14 2.3.2 TsoisTSO . . . 14 2.4 Testing. . . 15 2.4.1 Tools . . . 15 2.4.2 Comparisonofmodels . . . 15 2.4.3 Chara teristi tests. . . 16 3 Semanti sof barriers 19 3.1 Barriersguarantee . . . 19

3.2 Consideringaweakerguarantee . . . 20

4 Case study: a Power model 21 4.1 Completeeventstru turesandexe utionwitnesses . . . 22

4.2 Globalityofrfmaps. . . 24

4.3 Preservedprogramorder

ppo

. . . 24

4.4 Valuesdonot omeoutofthin air . . . 25

4.5 Cumulativememorybarriers . . . 27

5 Barrier experiments 28 5.1 O ialtests. . . 28

5.2 Classi altests . . . 31

5.3 Experiments. . . 31

6 Towards a stronger model 34 6.1 Extension ppo-ext

. . . 34 6.2 Semanti sof lwsyn . . . 34 7 Con lusion 37 7.1 Contribution . . . 37 7.2 Statusofwrites . . . 37

(7)

1 Introdu tion

Memory models are what des ribe and onstrain the behaviour of aprogram runningonamultipro essor. That said,understandingwhat aprogramwould doonsu hama hinerequiresapre isedenitionofthememorymodelindu ed by the ma hine, that is, the underlying memory system and the behaviour of the pro essors involved. Previousstudies [14, 18℄ have dis ussed the need for a rigorous denition of weak memory models, whi h some of the publi do umentations[3,4℄la k. Weprovidehereageneri andaxiomati framework to pre iselydene amemorymodelin terms ofseveral parametersand testit againstrealhardware.

Let us onsider a shared-memory multipro essor system, that onsists of several pro essors writing to or reading from a ommon shared memory. We will dis uss here what representation of memory and pro essorbehaviour we onsider.

Representationof memory Onerepresentationofasharedmemory ould beasinglememoryonwhi hseveralpro essorsoperatesimultaneously,alltheir writes being ommited to memory assoon asthey areissued. Thus, one an onsider the onne tion between pro essors and memory as dire t: as soon asone pro essorwrites to memory, the value written overwrites the previous valueandisimmediatelyavailabletoallpro essors. Thisproperty, alledstore atomi ity,hasbeenexaminedandadvo atedasvaluable[16,8,11℄asitprovides theguaranteethat a tionsonsu h amemoryareserialisable, whi h leadsto a ratherunderstandablememorymodel. However,itisnotguaranteedonseveral realar hite tures[1,10,3,4℄. Theyindeedrelaxthestoreatomi ity onstraint, whi h means a write is not available to all pro essors at on e. For example a write is at rst initiated by a given pro essor, then ommitted to a a he, and nally to memory. This last step is sometimes alled globally performed [15℄. Evenwithoutassumingwrites tobe ommittedimmediately, wesuppose atotalorderonthegloballyperformedwrites tothesamelo ation,aproperty sometimes alled oheren e [5℄thatiswidelyassumedbymodernar hite tures [1,10,3,4℄.

Pro essorbehaviour Onerepresentationofapro essorbehaviour ould sup-poseasequentialorder, onsistentwiththeprogramorder,ofallthereadsand writeseventsissuedbyagivenpro essor,asageneralisationoftheunipro essor ase. However,modernar hite tures[1,10,3,4℄providerelaxedmemorymodels that donot onstrainthewayreads andwrites are orderedthat mu h. These onstraints,ortheirrelaxation,areoftengatheredbehindtheterminstru tion reordering [8,11℄.

1.1 An axiomati generi model

Wewillpre iselydeneanar hite tureintermsofitsorderingandstore atomi -ityrelaxationsatse tion2.1.3. Forexample,SequentialConsisten y(hen eforth

SC

)[17℄,supposeswritestobe ommittedtomemoryassoonastheyareissued, andthat theprogramorderis maintainedbetweenalla esses,thusbeingthe strongest(inasensethatwillbedenedpre iselyatse tion2.2)memorymodel. Wewillillustratehowtoinstan iateourmodeltoprodu e

SC

andSpar

T SO

(8)

P

0

P

1

(a)

x

1

(c)

y

1

(b)

r2

y

(d)

r4

x i3

r2 = 0 ∧ r4 = 0

?

Figure1:

i

3

exhibitsnon-SC behaviouronmodernar hite tures

[1℄,andshowequivalen ewiththenativemodels,togetherwith hara terisation ofexe utionsthatwouldbevalidonthesemodels.

1.2 Study of barriers power

SC

providesindeed arather omfortable programmingmodel, whi h explains whymostar hite turesprovideme hanismssu hasbarriersandlo ks,torestore it from a weaker model. However, it is not lear how mu h power a barrier needsto providetheillusionof

SC

, andwhere to pla ethese onstru tionsin the ode. Weexaminethisquestionatse tion3.1fromageneralpointofview: weprovideasu ient onditiononbarrierstorestoreastrongestmodelfroma weakerone. Moreover,werenethis onditioninsomeparti ularyetinteresting asesatse tion3.2,su has

T SO

[1℄.

1.3 Case study: a Power model

Our generi framework,implemented in the Coq proof assistant[12℄, has two ompanion tools: memevents,written in OCaml, whi h is an exa t implemen-tation of our axiomati model, and litmus, whi h runs the same inputs that memevents takesonrealhardware.

Weprovidea serie of tests to instan iateproperlyourmodel with respe t to agiven ma hine or ar hite ture, whi h allowed us to design a model for a signi antfragmentofthePowerar hite turewithbarriers.

2 Des ription of the model 2.1 Axiomatisation

The lassi altestdepi tedatg. 1,whi h anbefoundin[6℄withnumber

2.3a

illustrates the fa t that we annot use aninterleavingsemanti sto reason on exe utionsindu edbyweakmemorymodels,asitsexhibitsanon-SCbehaviour onsome urrentar hite tures [6,5℄. Instead,wereasononrelationsoverread andwrite eventsraisedbyaninstru tion.

2.1.1 Basi obje ts

Aneventisanabstra tionofamemorya essperformedduringtheexe utionof amultipro essorprogram. Wenote

E

thesetofeventsgeneratedbyaparti ular exe ution. Eventsareoftwokinds: readsandwrites,whi hsetswillbedepi ted by

R

and

W

. Hen eforth,wewillnote

e

foranevent,

r

foraread,and

w

fora write. An event

e

will holditsdire tion(

R

or

W

),itslo ation,givenby

loc e

,

(9)

a: W[x℄=1 b: R[y℄=0 po:0 0:r1=0 : W[y℄=1 d: R[x℄=0 po:1 1:r2=0 a: W[x℄=1 b: R[y℄=0 po:0 0:r1=0 : W[y℄=1 d: R[x℄=1 po:1 1:r2=1 a: W[x℄=1 b: R[y℄=1 po:0 0:r1=1 : W[y℄=1 d: R[x℄=0 po:1 1:r2=0 a: W[x℄=1 b: R[y℄=1 po:0 0:r1=1 : W[y℄=1 d: R[x℄=1 po:1 1:r2=1

Figure2: Eventstru turesfortesti3.

itsvalue, givenby

val e

and itspro essor, given by

proc e

. We will note

(a)

x

vforawrite tolo ation

x

withvalue

v

labelled

(a)

,and

(b)

r1

y,fora readfrom

y

labelled

(b)

.

An exe ution is also hara terised by the program order po

, a relationon eventsthatree tsthesequentialexe utionofinstru tionsonasinglepro essor: giventwoinstru tions

i

1

and

i

2

thatgenerateevents

e

1

and

e

2

,having

e

1

po

→ e

2

oneventssimplymeansthat

i

1

pre edes

i

2

inprogramorder. po

isatotalorder amongst the events from the same pro essor

1

and never relates events from dierentpro essors.

We olle tthese informationsinto aneventstru ture,depi tedby

E

:

E ,

(E,

po

→)

Figure2illustratestheeventstru turesasso iatedtothetesti3depi tedat g. 1.

2.1.2 Exe ution witnesses

Wepostulatetworelationsoverevents: rf

and

ws

.

Rf A read-from map, links aread event with the write event that provides itsvalue. We representthe notionbyarelationfrom writes toreads,whi his well-formedin thefollowingsense:

prf

→ , {(w, r) | ∃lv, w ∈ W

l,v

∧ r ∈ R

l,v

}

wf

rf

rf

→ ,

rf

→⊆

prf

→ ∧∀r, ∃! w, w

rf

→ r

Wegatheredrstallpairsofwritesandreadswithsamelo ation

l

andvalue

v

, whi h setsaredepi tedby

W

l,v

and

R

l,v

, andthenenfor edtheuniquenessof readsour es.

1

Whensomeinstru tionsmayperformseveralmemorya esses, po

shouldin lude some ofintra-instru tiondependen ies [18 ℄,thusbe omingapartialorderon eventsfroma same pro essor.

(10)

Ws The writeserialisation isa totalorder of thewrites to a samelo ation. Thus, we rst gather allpairs of writes to the same lo ation, and we require therelationtobeatotalorderonwritestoasamelo ation

l

,whi hset willbe depi tedby

W

l

: pws

, {(w

1

, w

2

) | ∃l, w

1

∈ W

l

∧ w

2

∈ W

l

}

wf

ws

ws

→ ,

ws

→⊆

pws

→ ∧ ∀ℓ, total order (

ws

→ ↾ W

) W

Thenotation ws

→ ↾ W

standsfortherestri tionoftherelation ws

totheset

W

, i.e.

ws

→ ∩ (W

× W

)

.

Fr Fromthesetworelations,wededu eathird one,fr:

r

fr

→ w , ∃ w

, w

rf

→ r ∧ w

ws

→ w

w

r

w

0

(rf) (fr) (ws) Aswesaid, ws

ordersgloballyperformedwritesto thesamelo ation; thus, ifawrite

w

isbeforeanotherwrite

w

in ws

,weknowthat

w

isgloballythat is, for everypro essorbefore

w

. Furthermore,if aread

r

reads from

w

, we onsider

r

tobegloballyorderedthefollowingwrite

w

: otherwise,therewould be no guarantee

r

a tually read its value from

w

, thus ontradi ting the rf

relationbetweenthem.

Exe ution witnesses We gather these relations  ex ept fr

as it anbe dedu edfrom theothersinto anexe utionwitness,depi tedby

X

:

X ,

(

rf

→,

ws

→)

Figure 3 adds rf

and fr

edges to the event stru tures of gure 2. There are no

ws

edgesamong the(non-initialisation) writes shown. However,we see some

fr

arrowswhi h followfrom the serialization of init stores(whi h ome rstin

ws

)andofstoresgeneratedbyinstru tions. Forinstan e,intheleftmost pi ture,wehave

d

fr

→ a

. Indeed,theload

d

readstheinitialvalueoflo ation

x

, whi hlo ationisoverwritten(later!) bythestore

a

.

Wehavetheasso iatedwellformednesspredi ate

wf

,beingthe onjun tion ofthepredi atesfor

rf

and

ws

.

Initial and nal states The write serialization provides a natural way to denetheinitialandnalstatesofanexe ution:

init X

, {w | ¬(∃w

, w

ws

→ w)}

f inal X

, {w | ¬(∃w

, w

ws

→ w

)}

(11)

a:W[x℄=1 b: R[y℄=0 po:0rf : W[y℄=1 fr 0:r1=0 d: R[x℄=0 po:1 rf fr 1:r2=0 rf rf a: W[x℄=1 b: R[y℄=0 po:0 d:R[x℄=1 rf rf : W[y℄=1 fr 0:r1=0 po:1 rf 1:r2=1 rf a:W[x℄=1 b: R[y℄=1 po:0 rf 0:r1=1 :W[y℄=1 rf d: R[x℄=0 po:1 rf fr 1:r2=0 rf a: W[x℄=1 b:R[y℄=1 po:0 d:R[x℄=1 rf rf 0:r1=1 : W[y℄=1 rf po:1rf 1:r2=1

Figure 3: Exe utionwitnessesfori3.

2.1.3 Ar hite tures

Wedene herewhatwe onsider tobeanar hite ture.

Preserved program order We assume a fun tion

ppo

, whi h gathers all pairsofeventsthatarenotto bereorderedwithrespe tto theprogramorder po

. Considerforexamplethetesti3,depi tedat g. 1: thespe iedout ome wouldbevalidonlyifwritesandreadstodierentlo ations ouldbereordered. Thus, an ar hite ture that would authorise the spe ied out ome would not in ludewrite-readpairsin itspreservedprogramorder.

Wewillnote ppo

fortherelationoutputedbythisfun tion onagivenevent stru ture

E

, whi h is to be in luded in

po

. This relation is to be onsidered global,thatis,allpro essorsmustbehavewithrespe ttothe onstraintsindu ed byit.

Globalityofrelations Asstatedintheintrodu tion,we onsiderwritestobe non-atomi ,thatis,notne essarilyavailabletoallpartsofthememorysystem aton e. Thus,thebehaviourof allpro essorsmustnotne essarilyin ludethe onstraints indu ed by

rf

relations. However, we distinguish the onstraints indu edby internal

rf



rf

relation on asamepro essorand external  rf

from one pro essorto another. Thus, we split the

rf

relation into r

, whi h representstheeventsin

rf

onthesamepro essor,and rfe

,whi hrepresentsthe eventsin rf

ondierentpro essors:

w

r

→ r , w

rf

→ r ∧ proc w = proc r

w

rfe

→ r , w

rf

→ r ∧ proc w 6= proc r

(12)

Relationsindu edbythe presen eof barriers Weassumegivena fun -tion

ab

, whi h, provided an event stru ture

E

and an exe ution witness

X

, denestherelationovereventsindu edbythepresen eofabarrierinbetween in

po

twoinstru tions:

ab

: E → X → rln E

where

E

(resp.

X

)isthetypeofeventstru tures(resp. exe utionwitnesses). Theseinformationsarewhatdenesforusanar hite ture,depi tedby

A

: Denition1 (Ar hite ture)

A ,

(ppo, int, ext, ab)

2.1.4 Validity ofan exe utionwith respe t to anar hite ture Wedeneherewhatitmeansforanexe utionwitness

X

tobevalidonagiven ar hite ture

A

.

Unipro essorbehaviour Somedo umentations[3℄ laimthatasole pro es-sorissupposedtorespe tthesequential exe utionmodel,thatis:

themodel ofprogramexe utionin whi hthepro essorappears to exe ute one instru tion at a time, ompleting ea h instru tion beforebeginningtoexe utethenextinstru tion

FollowingAlpha[10℄,wedenethepro essorissueorder,depi tedbythe pio

relation,asfollows:

e

1

pio

→ e

2

, e

1

po

→ e

2

∧ loc e

1

= loc e

2

We all hb

theunionofthethreerelations rf

→,

ws

and fr

: hb

→ ,

rf

→ ∪

ws

→ ∪

fr

Noti e that hb

is not the proper happens-before relation in the general ase, but ratherthehappens-beforeofamemorywithmulti- opy-atomi writes. We denethegeneralhappens-beforerelationinthenextse tion.

To provide our exe utions the guarantee that they respe t the sequential exe ution model,werequirethatall therelations

rf

, ws

and fr

are onsistent withthepro essorissueorder,that is:

uniproc , acyclic

(

hb

→ ∪

pio

→)

Figure4givesanexampleofanout omethatisforbiddenbe auseof

uniproc

. Therearetwoexe utionsforthisout ome,withdierentwriteserializations:

a

ws

→ b

onthe left, and

b

ws

→ a

onthe right. Informer ase,wehave

c

fr

→ b

(by

a

ws

→ b

and

a

rf

→ c

). Thus, invalidation follows from y le

b

pio

→ c

fr

→ b

. In the latter ase,the y le is

a

rf

→ c

pio

→ d

fr

→ a

,thelaststepfollowingfrom

b

ws

→ a

and

b

rf

(13)

P

0

P

1

(a)

x

1

(b)

x

2

(c)

r2

x

(d)

r3

x Forbidden: 1:r2=1;1:r3=2; a:W[x℄=1 b:W[x℄=2 ws :R[x℄=1 rf po:1po-lo d:R[x℄=2 rf rf fr po:1po-lo 1:r2=1 1:r3=2 a:W[x℄=1 :R[x℄=1 rf rf b:W[x℄=2 ws po:1po-lo d:R[x℄=2 rf po:1 po-lo 1:r2=1 fr 1:r3=2

Figure4: Invalidexe utionsby

uniproc

.

Alltogether We all ghb

theunionoftherelationsthatareglobal: ghb

→ ,

ppo

→ ∪

ws

→ ∪

fr

→ ∪

rf?

→ ∪

ab

with rf?

→ ,

r?

→ ∪

rfe?

where r?

(resp. rfe?

)is r

(resp. rfe

)if

int

(resp.

ext

) is

true

,theemptyrelationotherwise.

We annowdene what avalid exe utionis, with respe t to an ar hite -ture

A

:

Denition2(Valid exe ution)

A.valid E X

, wf ∧ uniproc ∧ acyclic (

ghb

→)

Weak Memory Models Let

W

bethe typeof memorymodels, dened as follows:

W , E → X → {⊤, ⊥}

Thus,wedenedafun tion

W mm



A

beingthetypeofar hite tures,whi h produ esaweakmemorymodelindu edby

A

:

Denition3(Weak MemoryModel)

W mm

:

A → W

W mm(A) ,

∀ E X, A.valid E X

Hen eforth,wewillnote

AW mm

for

W mm(A)

. 2.1.5 Examples

Wewill showhow to produ e aparti ular model from ourgeneri framework ontwo lassi almemorymodels, SequentialConsisten y [17℄, lateronreferred to as

SC

and

T SO

[1℄, thus illustrating the on epts we used to dene our framework. Wewillshowat se tion2.3thatthese denitionsareequivalentto thenativeones.

(14)

Sequential onsisten y

SC

hasbeendenedbyLamportasfollows: The resultof any exe ution is the sameas if the operationsof allthe pro essorswere exe uted in some sequentialorder, and the operations of ea h individual pro essor appear in this sequen e in theorderspe iedbyitsprogram. [17℄

We give here a formal denition of an

SC

exe ution. We need at rst a sequentialexe ution

ex

,thatisatotalorder onsistentwiththeprogramorder:

seq

ex

→ , total order

ex

→ E ∧

po

→⊆

ex

Weneedtohighlighttheimpli itexe utionmodel,whi hstatesthataread

r

readsfromthemostre entwritethatisbeforeitin

ex

. Letusnote

pw

o

(r)

theset ofpreviouswritesfor

r

inapartialorder

o

todenethe

rf

relationfor an

SC

exe utionthatis, whi hread readsfromwhi h write:

SC.rf

ex

, {(w, r) | w

prf

→ r ∧ w = max pw

ex

(r)}

Thus a valid

SC

exe ution will begivenby asequentialexe ution ex

and the al ulationofitsindu ed

rf

relationasabove.

Fromsu h anexe ution,we anprodu eanexe utionwitness:

SC.ws

ex

, {(w

1

, w

2

) | w

1

ex

→ w

2

∧ w

1

pws

→ w

2

)}

SC.wit

ex

, (SC.rf

ex

→ , SC.ws

ex

→ )

Weproposehereanalternativenotionof

SC

,whi hwewillshowequivalent tothenativeonein or.3:

Sc.Arch ,

(

po

→, true, true,

ab

→)

Sc.W mm , W mm(Sc.Arch)

TSO TodesignaproperTSOexe ution,weneedto requirewhattheSpar do umentation[1℄spe ies:

R

, {(r, e) | r

po

→ e}

W W

, {(w

1

, w

2

) | w

1

po

→ w

2

}

ptso

ex

,

partial order

ex

→ E ∧

R

∗ ⊆

ex

→ ∧

W W

ex

→ ∧

tso

→,

tso

→⊆

ex

→ ∧

total order

tso

(15)

Moreover, we need to highlight the expli it exe ution model, provided by the

V al

axiominthedo umentation:

V al(L

a

) = V al(max

ex

{S

a

| S

a

ex

→ L

a

∨ S

a

po

→ L

a

})

whi hstatesthataread

r

readsfromthemostre entwritethatisbeforeit in

ex

→ ∪

po

. Thuswedenethe rf

relationfora

T SO

exe utionthatis,whi h readreadsfromwhi hwrite:

T SO.rf

ex

, {(w, r) | w

prf

→ r ∧ w = max(pw

(

ex

→∪

po

)

(r))}

Asin the

SC

ase,weprodu eanexe utionwitness:

T SO.ws

ex

, {(w

1

, w

2

) | w

1

pws

→ w

2

∧ w

1

ex

→ w

2

}

T SO.wit

ex

, (T SO.rf

ex

→ , T SO.ws

ex

→ )

Weproposehereanalternativenotionof

T SO

,whi hwewillshowequivalent tothenativeonein or.4:

ppo

_

tso

, R ∗ ∪ W W

T so.Arch

, (ppo

_

tso, f alse, true, ab)

T so.W mm

, W mm(T so.Arch)

The ppo

is quite lear from the do umentation. The Val axiom indi ates that the internal

rf

are not in luded in ex

, whereas the external are, asthe writefromwhi hareadreadsisthe

max

ofitspreviouswritesin

ex

. Thuswe onsider rfe

tobeglobal, whereas r

arenot. 2.2 Comparison of ar hite tures

From ourdenition of ar hite ture arises avery simplenotionof omparison; wedenethepredi ateweaker amongar hite turesasfollows:

Denition4(Weaker)

A

1

≤ A

2

, ppo

1

⊆ ppo

2

int

1

→ int

2

∧ ext

1

→ ext

2

ab

1

⊆ ab

2

Theorem1(Validityis de reasing)

∀A

1

A

2

, A

1

≤ A

2

∀EX, A

2

.valid E X

→ A

1

.valid E X

Proof[in Coq℄ From

A

1

≤ A

2

, we have

A

1

.ghb

⊆ A

2

.ghb

, thus if

A

2

.ghb

is

(16)

2.2.1 Makingvaliditymonotonous

Wedeneherea riterionto he kifanexe ution

X

runningonanar hite ture

A

1

would bevalidonastrongerar hite ture

A

2

:

A

1

.check

A

2

, acyclic (A

2

.ghb)

Weshow that this riterion hara terisesan exe utionrunning on

A

1

that wouldbevalidon

A

2

:

Theorem2(Chara terisation)

∀A

1

A

2

, A

1

≤ A

2

∀EX, A

1

.valid E X

∧ A

1

.check

A

2

E X

↔ A

2

.valid E X

Proof[inCoq℄

⇒ X

being valid on

A

1

, we have all requirements  well formedness and unipro toguaranteeitisvalidon

A

2

,ex eptthelast predi ate,whi h holdsbythehypothesis

check

A

2

.

⇐ X

being valid on

A

2

gives us all requirements  well formedness and unipro toguaranteeitsvalidityon

A

1

ex eptthelastone. As

A

1

≤ A

2

, we know that

A

1

.ghb

⊆ A

2

.ghb

(lemma ghb_in l), thus the a y li ity requirementfor

A

1

.ghb

holdsif

A

2

.ghb

isa y li .



2.2.2 Examples

S Inthe ontextofourgeneri framework,wedesigneda riteriontode ide if a parti ular exe ution

X

, with respe t to an event stru ture

E

and on an ar hite ture

A

,isS :

A.check

Sc

, acyclic (

po

→ ∪

hb

→)

This riterion hara terisesvalidweakexe utionsthat areS : Corollary 1(S hara terisation)

∀AEX, A ≤ Sc, A.valid E X ∧ A.check

Sc

E X

↔ Sc.valid E X

Proof[inCoq℄

As

po

→ ∪

hb

→= Sc.ghb

,thisisadire t onsequen eofthm. 2.

as

A

≤ Sc

,thisisadire t onsequen eofthm. 1.



Thisresultallowsustoseethattheout ome0:r1=0;1:r2=0fori3(leftmost pi tureingure3)willnevershowuponasequentially onsistentma hine. All otherexe utionsdepi teding. 3are

SC

bythesameargument.

(17)

Tso Inthe ontextofourgeneri framework,wedesigneda riteriontode ide if a parti ular exe ution

X

, with respe t to an event stru ture

E

and on an ar hite ture

A

,isTso ; onsider hb_tso

to be ws

→ ∪

fr

→ ∪

rfe

:

A.check

T so

, acyclic (

ppo_tso

hb_tso

→ )

This riterion hara terisesvalidweakexe utionsthatareTso : Corollary2(Tso hara terisation)

∀AEX, A ≤ T so, A.valid E X ∧ A.check

T so

E X

↔ T so.valid E X

Proof[inCoq℄

As

ppo_tso

hb_tso

→ = T so.ghb

,thisisadire t onsequen eofthm.2.

as

A

≤ T so

,thisisadire t onsequen eofthm.1.



This result allows us to on ludethat all the out omes for i3spe ied in g.3mayshowupona

T so

ma hine.

2.3 Equivalen e with native models 2.3.1 S isSC

Weshowthat theSCdenitionfrom[17℄is equivalenttoourdenition: Theorem3(S isSC)

∀EX, Sc.valid E X ↔ ∃

ex

→, seq

ex

→ ∧ SC.wit

ex

→ = X

Proof[inCoq℄

from

X

beingvalidon

Sc

,wehave

acyclic

(

ghb

→ )

,whi hmeans

acyclic

(

hb

po

→)

on

Sc

. Weknowby or.1this onditionisne essaryand su ient toobtainanequivalent

SC

exe ution.

from thesequentialexe ution ex

,weprodu ea

SC.wit

whi h isvalidon anyweakerar hite turebythm.1.



2.3.2 Tso isTSO

Weshowthat theTSOdenition from[1℄isequivalenttoourdenition: Theorem4(Tso is TSO )

∀EX, T so.valid E X ↔ ∃

ex

→, ptso

ex

→ ∧ T SO.wit

ex

→ = X

Proof[inCoq℄

from

X

being valid on

T so

, we know

X

satises

check

T so

by or. 2.

check

T so

givesusana y li relation,thereforeapartialorder on

E

, su h thatitsrestri tionto

W

isthetotalorderonstoresrequiredby

T SO

. As

T so.ghb

in ludes

R∗

and

W W

by onstru tion,wehavethenal require-mentstoprovideanexe utionvalidon

T SO

.

from ex

,weprodu ea

T SO.wit

whi hisvalidonanyar hite tureweaker

(18)

2.4 Testing

In this se tion we pre isely dene our testing methodology and des ribe our tools.

2.4.1 Tools

litmus To understand the memory model provided by a given ma hine

M

, weuse litmus tests, whi h are assembly programs, with spe ied initial state of memory and registers. Torun them ona ma hine, we use ourlitmus tool, whi hrunsa

C

skeletonintowhi hthelitmustestisen apsulated. Foragiven testtrunningon

M

,we olle tthenal ontentofmemoryandregisters,thus dening asetofobserved out omes

O

M

(

t

)

.

memevents To omparethememorymodelasobservedonama hineandour theorite alone,weimplementedourgeneri frameworkin thememeventstool, writtenin OCaml. Themainmoduleaxiomisanimplementationofthetheory presentedatse tion2: providedanar hite turemodule

A

su has

Sc.Arch

or

T so.Arch

, itoutputsallpossibleexe utionwitnesses (inthe absen eofloops) thatarevalidinthememorymodel

W

indu edby

A

inthesenseofthe

valid

predi atedened atse tion2.1.4,whi h

f inal

dene theset ofvalid out omes

V

W

(

t

)

. When there are loops, it unfolds them several times, whi h gives a subsetofvalid exe utions,whi hhasbeenenoughforourpurposes. Moreover, memevents is abletooutput a ounter example: when aparti ular out omeis spe ied,itshowswhi h y lesinthe

ghb

relationinvalidatethisexe ution. This givesaninsightonwhythisexe utionisnotallowedonaparti ularar hite ture, andifbarriersareneededornot.

2.4.2 Comparisonof models

An additional tool, ompare, examines, for a given test t run on a ma hine

M

,thefollowing ases:

O

M

(

t

) ⊆ V

W

(

t

)

,fromwhi hweknowourmodelisnot invalidated,and

O

M

(

t

) 6⊆ V

W

(

t

)

,fromwhi hweknowourmodelisinvalidated. When

O

M

(

t

) ⊆ V

W

(

t

)

,themost hallenging aseiswhentisin

V

W

(

t

)

yetnot in

O

M

(

t

)

,that isit hasan out omewhi his valid yet notobserved. Several reasonsexplainthis situation: either thasnot beenrun enoughto observeit, orthetested ma hine doesnotimplement thefeature highlightedbythetest. Inthat asethemodelistoopermissivewithrespe ttothisma hine. However, wedonotseektheadequationof

O

M

(

t

)

and

V

W

(

t

)

: doingsowouldleadusto parti ulariseourmodelsothatitrendersthemodelofthetestedma hine. Aswe wanttogiveamodelofanar hite ture,weshouldonthe ontrarydenealooser modelwhi h in ludes the observed out omesof anyma hine that implements thear hite ture.

Tobemorepre ise, givenan ar hite ture

A

, amodel

W

= W mm(A)

and animplementation

M

of

A

,wedenetworequirementsthatmustsatisfy

W

to bevalid anda urate withrespe tto

M

:

Denition5 (Validity and a ura y of a model)

valid W ,

∀M, ∀

t

,

O

M

(

t

) ⊆ V

W

(

t

)

accurate

M

W ,

t

,

V

W

(

t

) ⊆ O

M

(

t

)

(19)

observed never observed

i

5

int=false int=true

i

6

ext=false ext=true

i

3

WR

6⊆

ppo WR

ppo

i

4

WW

6⊆

ppo WW

ppo

i

1

RW

6⊆

ppo RW

ppo

i

2

RR

6⊆

ppo RR

ppo Figure 5: Summaryof hara teristi tests

i1:RWrelaxation a:R[x℄=1 b:W[y℄=1 ppo?po:0 :W[y℄=2 ws ab d:W[x℄=1 po:1 fen ed rf rfe rf i2:RRrelaxation a:W[x℄=1 b:W[y℄=1 po:0 fen ed :R[y℄=1 ab rf rfe rf d:R[x℄=0 ppo?po:1 fr rf i3:WRrelaxation a:W[x℄=1 b:R[y℄=0 po:0 ppo? rf :W[y℄=1 fr d:R[x℄=0 po:1 ppo? rf fr rf rf i4:WWrelaxation a:W[x℄=1 b:W[y℄=2 ppo?po:0rf :W[y℄=3 ws d:W[x℄=4 po:1 ppo? rf ws i5:rrelaxation a:W[x℄=1 b:R[x℄=1 global? r po:0 rf :R[y℄=0 ll po:2 d:W[y℄=1 fr e:R[y℄=1 r po:1 global? rf f:R[x℄=0 ll po:1 fr rf rf i6:rferelaxation a:W[x℄=1 b:R[x℄=1 global?rferf :W[y℄=1 ls po:1 d:R[y℄=1 global?rferf e:R[x℄=0 ll po:2 fr rf

Figure 6: Chara teristi testsexhibitingrelaxations

Thus we requireour model to be

valid

but not ne essarily

accurate

with respe ttoallitsimplementations.

2.4.3 Chara teristi tests

Wepresentherethekeyteststounderstandhowtoinstan iatetheparameters ofourmodel. Letusassumegivenama hine

M

that implementsamodel

W

. Themain ideaistoobserveonerelaxationofthestoreatomi ityorordering onstraints  at the time, by onsidering an exe ution where all relations in-volvedareglobal, ex epttheonein question. Thus,ifthespe iedout omeis observed,

M

exhibits thisrelaxation,otherwisethere wouldhavebeena y le inthevalidity he kofthisexe ution,whi hwouldhavebeenforbidden.

Weassumehereitisalwayspossibletomaintaina

R∗

pairinprogramorder, usingadependen ybetweenthetwoa esses. Thisdoesnotmeanwe onsider all

R∗

pairsto be preservedin program order, but only theones that havea dependen ybetweenthem. Maintained

RR

(resp.

RW

)pairswill bedepi ted by

ll

(resp.

ls

).

Globality ofrfmaps Internal rfmaps

(20)

P

0

P

1

(a)

x

1

(d)

y

1

(b)

r1

x

(e)

r3

y ll ls

(c)

r2

y

(f )

r4

x i5

r1 = r3 = 1 ∧ r2 = r4 = 0

?

The test

i

5

anbe found in [6℄, with number

2.4

: it is laimed to highlighta feature alled intra-pro essor forwarding, and illustrates the visibilityof store buering to theprogrammer. Ifthespe iedout omeof thistest isobserved, thenwe onsider

r

nottobeglobalon

M

. Ifthespe iedout omenever shows up,then

r

anbe onsideredglobal. Indeed,asdepi tedatg. 6,ifinternal rf

wereglobal,therewouldbeabold

rf

betweenevents

a

and

b



W

[x]

and

R[x]

from

P

0

 andbetween events

d

and

e



W

[y]

and

R[y]

from

P

1

. This would leadto a y le

a

r

→ b

ppo

→ c

fr

→ d

r

→ e

ppo

→ f

fr

→ a

in this exe ution,whi hwould thereforebeforbidden. Externalrfmaps

P

0

P

1

P

2

(a)

x

1

(b)

r1

x

(d)

r2

y ls ll

(c)

y

1

(e)

r3

x i6

r1 = 1 ∧ r2 = 1 ∧ r3 = 0

?

Thetest

i

6

also anbefoundin[6℄,withnumber

2.6

,orinthelitteratureunder thename

W RC

[13℄, and nally in thePowerdo umentation with name

isa1

[3℄. If the spe ied out ome of this test is observed, then we onsider

rfe

not to be global on

M

. Ifthe spe iedout ome never shows up, then

rfe

anbe onsideredglobal. Indeed,asdepi tedatg. 6,ifexternal

rf

wereglobal,there wouldbeabold

rf

betweenevents

a

and

b



W

[x]

from

P

0

and

R[x]

from

P

1

 andbetweenevents

c

and

d



W

[y]

from

P

1

and

R[y]

from

P

2

. Thiswouldlead to a y le

a

rfe

→ b

ppo

→ c

rfe

→ d

ppo

→ e

fr

→ a

inthis exe ution, whi h would therefore beforbidden.

Whileobservingany rf

nottobeglobalonama hine,asthisistheweakest onditionon

rf

, oneshould assumethatanyma hine that implements

W

has the orrespondingparameterto

f alse

,otherwise

W

ouldbeinvalidated. Preserved program order

W R

pairs The test

i

3

, whi h is depi ted at g. 1, anbe found in [6℄, with number

2.3a

. If the spe ied out ome is observed, then

W R

pairs are notpreservedinprogramorder: the

ppo

parameterof thisma hine shouldnot in lude

W R

pairs. If the spe iedout ome never shows up, then the

ppo

of this ma hine in ludes

W R

. Indeed, as depi ted at g. 6, if

W R

pairs were global, there would beabold

ppo

betweenevents

a

and

b



W

[x]

and

R[y]

on

P

0

andbetween

c

and

d



W

[y]

and

R[x]

on

P

1

. This would leadto a y le

a

ppo

→ b

fr

→ c

ppo

→ d

fr

(21)

W W

pairs

P

0

P

1

(a)

x

1

(c)

y

3

(b)

y

2

(d)

x

4 i4

x

= 1 ∧ y = 3

?

If the spe ied out ome of the test

i

4

is observed, then

W W

pairs are not maintainedinprogramorder. Ifitnever showsup,thenthe

ppo

ofthisma hine in ludes

W W

. Indeed, asdepi ted at g. 6, if

W W

pairs were global, there wouldbeabold

ppo

betweenevents

a

and

b



W

[x]

and

W

[y]

on

P

0

andbetween

c

and

d



W

[y]

and

W

[x]

on

P

1

. Thiswouldleadtoa y le

a

ppo

→ b

ws

→ c

ppo

→ d

ws

→ a

inthisexe ution,whi hwould thereforebeforbidden.

RW

pairs

P

0

P

1

(a)

r1

x

(c)

y

2

(b)

y

1 fen e

(d)

x

1 i1

r1 = 1 ∧ y = 2

?

Ifthespe iedout omeofthetest

i

1

isobserved,then

RW

pairsarenot main-tained in program order. If it never shows up, then the

ppo

of this ma hine in ludes

RW

. Indeed, as depi ted at g. 6, if

RW

pairs were global, there would be a bold

ppo

between events

a

and

b



R[x]

and

W

[y]

on

P

0

. If the barrieron

P

1

is B- umulative,it ordersevents

c

and

d

but also

c

and

a

. This wouldleadto a y le

a

ppo

→ b

ws

→ c

ab

→ a

inthis exe ution,whi h would therefore beforbidden.

RR

pairs

P

0

P

1

(a)

x

1

(c)

r3

y fen e

(b)

y

1

(d)

r4

x i2

r3 = 1 ∧ r4 = 0

?

Thetest

i

2

an befound in [6℄, withnumber

2.1

. If

M

hasglobal external

rf

, andpreserves

W W

pairs,and thespe ied out omeisobserved, then

RR

pairsarenotpreservedin

ppo

. Ifthespe iedout omeisnever observedunder thesamehypothesis, then

ppo

in ludes

RR

pairs.

If

M

doesnothaveglobalexternal

rf

,we an onsiderthesametest mod-ied so that a B- umulative barrier is between the instru tions on

P

0

: the B- umulativityenfor es aglobal ordering between

(a)W [x]

on

P

0

and

(c)R[y]

on

P

1

. Ifthespe iedout omeisobserved,we an on ludethat

RR

pairsare notin

ppo

: otherwise, there would be abold

ppo

between

c

and

d



R[y]

and

R[x]

on

P

1

 whi h would lead to a y le

a

ab

→ c

ppo

→ d

fr

→ a

in the exe ution, thereforeforbidden.

(22)

3 Semanti s of barriers 3.1 Barriers guarantee

Letus onsidertwoar hite tures

A

1

≤ A

2

. Weexamineherewhatthebarriers providedby

A

1

should guaranteetorestore

A

2

.

We note rf

2\1

for rf

2

?

→ \

rf

1

?

. We dene the predi ate

A

1

.f b

 for fully barriered on

A

1

asfollows: ab

1

→ = (

rf

2\1

→ )?;

ppo

2

→ ; (

rf

2\1

→ )?

Weshow that thisa su ient onditiononthe barriersprovided by

A

1

to restoreanexe utionvalidon

A

2

:

Theorem5(Barriers guarantee)

∀A

1

A

2

, A

1

≤ A

2

∀EX, A

1

.valid E X

∧ A

1

.f b E X

⇒ A

2

.valid E X

Proof[in Coq℄ Suppose that it is not valid on

A

2

: thus we have a y le in

A

2

.ghb

,thatisin

ws

→ ∪

fr

→ ∪

rf

2

?

→ ∪

ppo

2

. Su ha y leisa y lein ws

→ ∪

fr

→ ∪

rf

1

?

ppo

1

→ ∪(

rf

2\1

→ )?∪

ppo

2

→ ∪(

rf

2\1

→ )?

whi h impliesa y le in ws

→ ∪

fr

→ ∪

rf

1

?

→ ∪

ppo

1

ab

1

thatis,

A

1

.ghb

. Thus we ontradi tthevalidityof

X

on

A

1

.



Thisresultprovidesaninsightonwhatpowershouldhaveabarrierprovided byanar hite ture

A

1

e.g. PowerPCtorestore astrongermodele.g.

SC

. First,thebarriershouldrestorethepairsthatarepreservedinprogramorderon

A

2

but noton

A

1

. Se ond,thebarriershould ompensatethela kofrelations between writes and reads events whi h we model by

rf

not being a global relation in the general ase. Thus, if the

rf

relation is not global on

A

1

but globalon

A

2

,weover omethela kofglobalityof

rf

byorderingthebeginning withtheendofthe hain. Thisishowweinterpretthe umulativity ofbarriers as stated in the PowerPC do umentation [3℄. We interpret furthermore the A- umulativity (resp. B- umulativity) property, as applying to barriers that enfor eorderingofpairsin

rf

→;

po

(resp. po

→;

rf

). We onsiderabarrierthatonly preservespairsin

po

to benon umulative.

Provided abarrier that hassu h power, we also havean insight on where to pla e these barriers in the ode: the statement of the theorem indi ates indeedthatbarriersshouldbeinbetweenanypairsin

ppo

2

su hthatoneofthe omponent of this pair(or both) may give rise to a

rf

relation that is to be globalonthestrongerar hite ture

A

2

butisnoton

A

1

.

From any ar hite ture to

Sc

We designed semanti s for a barrier that, for any weak memory model indu ed by an ar hite ture

A

, would su e to restablish

Sc

.

e

1

fen ed

→ e

2

, ∃b, e

1

po

→ b ∧ b

po

→ e

2

(23)

fen ed

→ =

po

(pla ement)

e

1

ab

→ e

2

,

e

1

fen ed

→ e

2

(base)

∨ e

1

rf

→ r ∧ r

ab

→ e

2

(A- umulativity )

∨ e

1

ab

→ w ∧ w

rf

→ e

2

(B- umulativity )

Thisbarrierordersallpairsin po

asindi atesthebase ase;italso ompen-satestheeventualla kofvisibilityof

rf

on

A

byorderingthetwoendsofa hain rf

→;

po

(resp. po

; rf

) as indi ates the A- umulativity (resp. B- umulativity) ase.

3.2 Considering a weaker guarantee

Wesaidthebarriershouldatrstrestorethepairsthatarepreservedinprogram orderonthestrongerar hite ture. Thus,forthesimple asewherenoneofthe omponent ofapair in

ppo

givesriseto a rf

relationthat isglobal on

A

2

but noton

A

1

, there isno need fora barrieraspowerfulasabove: abarrier that onlyorders theevents that surroundit stati ally would be enough. Consider the

wf b

predi ate:

wf b ,

ppo

2

→ \

ppo

1

, the following result arises asa natural orollaryofthm. 5.

Corollary3(Non umulative barriersguarantee)

∀A

1

A

2

, A

1

≤ A

2

rfe

1

?

→ =

rfe

2

?

→ ⇒

∀EX, A

1

.valid E X

∧ A

1

.wf b E X

⇒ A

2

.valid E X

From

T so

to

Sc

As rfe

are onsidered global in both

T so

and

Sc

, weonly needanon umulativebarrierto restore

Sc

from

T so

. We dene thepairsof writesand readsin

po

asfollows:

W R ,

{(w, r) | w

po

→ r}

. Torestore

Sc

from

T so

,weneedtopreservethe

W R

pairs,astheyarepreservedon

Sc

butnoton

T so

. Asinternal

rf

are

W R

pairs,su habarrierwould ompensatethela kof visibilityofinternal

rf

on

T so

aswell. Thereforewedenethefollowingbarrier semanti s: fen ed

→ =

W R

(pla ement)

e

1

ab

→ e

2

,

e

1

fen ed

→ e

2

(base)

We dene thepredi ate

T so.wf b

asfollows:

T so.wf b , T so.ab

= W R

, andthefollowingtheoremarisesasanatural onsequen eof or.3:

Theorem6(Barriers pla ementon

T so

)

(24)

Proof[inCoq℄From

X

being

wf b

on

T so

,wehave

T so.ghb

=

ws

→ ∪

fr

→ ∪

ppo_tso

rfe

→ ∪W R

, whi h is a y li sin e

X

is valid on

T so

. As

W R

oversboth r

and

W R

pairsthat arenotin

rf

,wegetthea y li ityof

Sc.ghb

dire tly.



Fromanyar hite ture

A

≤ T so

to

T so

Wedesignedsemanti sforabarrier that, for any weak memorymodel indu ed byan ar hite ture

A

weakerthan

T so

,wouldrestablish

T so

: fen ed

→ =

ppo_tso

(pla ement)

e

1

ab

→ e

2

,

e

1

fen ed

→ e

2

(base)

∨ e

1

rfe

→ r ∧ r

ab

→ e

2

(A- umulativity )

∨ e

1

ab

→ w ∧ w

rfe

→ e

2

(B- umulativity ) Here, all pairs ex ept

W R

are to be preserved in program order, whi h is depi ted by the pla ement ondition

fen ed

→ =

ppo_tso

. As internal rf

are not onsidered global in

T so

, there is no need to ompensate them: the ordering powerof the barrier on erns onlytheexternal

rf

, asdepi tedby theA- and B- umulativity ase.

Weretrievewhat is des ribed in theSpar V9do umentation [2℄:

T SO

is indeed obtained from

P SO

, whi h is obtainedfrom

RM O

by barriers pla e-ments. Inourframework,wedene

Rmo

and

P so

asfollowswhere

R∗

l

(resp.

W W

l

)representsallpairsof

R∗

(resp.

W W

)tothesamelo ation:

Rmo.Arch

, (R∗

l

∪ W W

l

, f alse, true, ab1)

P so.Arch

, (R ∗ ∪ W W

l

, f alse, true, ab2)

As for

T so

, we dedu e from the Val axiom that external rf

are global, whereasinternalarenot.

Thedo umentation spe ies that

P SO

is obtainedfrom

RM O

by adding

LoadLoad

and

LoadStore

barriers after ea h read. This statement has two onsequen esinourframework:rst,

R∗

pairsarepreservedin

P so

,andse ond, torestore

P so

from

Rmo

,sin etheexternal

rf

arealreadyglobalon

Rmo

,one should useanon umulativebarrierthatpreserves

R∗

pairs.

T SO

isobtainedfrom

P SO

byadding

StoreStore

barriersafterea hwrite. We on lude that

W W

pairs are preserved in

T so

, and that to restore

T so

from

P so

, oneshould use anon umulativebarrier that preserves

W W

pairs. Thus,from

Rmo

to

T so

,anon umulativebarrierthatpreserves

R∗

and

W W

isneeded,asstatedbythe or.3.

4 Case study: a Power model

Inthis se tionwedene anar hite tureforPower,thatis, we denerelations ppo

and ab

(25)

model;werather onfrontatentativePowerPCmodel againsta tualPowerPC ma hines.

4.1 Complete event stru tures and exe ution witnesses Hen eforth, we will reason on omplete event stru tures, on whi h se tion 2 abstra t. We shall avoid exhaustive treatment of omplete event stru tures, interestedreadersmayreferto[18,9℄;instead,wesket hthemainideas. Additional events In addition to memory events, the exe ution of an in-stru tionmaygenerateavarietyof events: mostinstru tionsgenerateregister events that rendera essestoregisters,memorybarriers instru tionsgenerate barrierevents and onditionalbran hinstru tionsgenerate ommitevents that expressbran hingde isions. Wenote

B

thesetofbarriereventsand

C

thesetof ommits,

b

and

c

beingtypi alelements. Weshallhandlethreememorybarrier instru tions: isyn , syn and lwsyn . The orrespondingevents are distin-guishedbypredi atesis-isyn ,et . Asin previousse tions,westilldenotethe set ofmemoryeventsby

E

(typi alelement

e

),the setof memoryread events by

R

(typi al element

r

), and the set of memory write events by

W

(typi al element

w

).

Extended or additional relations Weextend theprogram order relation po

toallevents. Inparti ular, po

nowordersbothmemoryand barrierevents. Moreover, ompleteeventstru tures ompriseadditionalrelations,more spe if-i ally intra instru tion ausality

ii o

that represents the ordering onstraints of eventswithin asame instru tion. Moreover, the following relation

fen ed(k)

rendersthepresen eofabarrierofstyle

k

betweenmemoryevents

e

1

and

e

2

:

e

1

fen ed(

k

)

, ∃b,

is-

k(b) ∧ e

1

po

→ b

po

→ e

2

.

Exe ution witnessesalsobe omemore omplete,asrelation rf

nowrelates registerevents. Wenote

rf-reg

thesubrelationof rf

thatrelatesregisterstoresto registerloadsthat readtheirvalues. Asasidenote,relation

rf-reg

derivesfrom sequential exe ution in a mu h strongersense than

rf

on memory. Namely,

w

rf-reg

→ r

when

w

ismaximalamongsttheprede essorsof

r

in programorder.

Illustration Figure 7showsaprogramfragmentand afragmentof the or-responding ompleteeventstru ture,togetherwithanexe utionwitness.

Therstinstru tionisanindexedloadfrommemorylwz: baseaddress

y

is takenfromregisterr5(whi hhas beenwriteninto elsewhere),index is

0

,and thevalue read from theee tiveaddress

y

+ 0

is stored into registerr2. We hereviewthreeevents,labelledongureas

c

,

a

and

d

. Asanexampleof intra-instru tiondependen y,

a

ii o

→ d

expresses that loadingfrom memorypre edes storingintoregisterr2.

The ompareandbran hsequen eislesstrivial: theinstru tion mpwi r2,1 omparesthe ontentofr2tothe onstant1andstorestheresultof omparison (

2

means equality) into the ontrol register r0. The next instru tion is a onditionalbran hbne,withbran hing de ision( ommitevent

h

) onditioned bythe ontentsof ontrolregister r0(

g

ii o

(26)

lwzr2,0(r5) mpwir2,1 bneL1 stwr3,0(r6) a: R[y℄=1 d: W 1:r2=1 ii o h: Commit dd b: W[z℄=3 trl : R1:r5=y ii o e: R1:r2=1 rf-reg f: W1:CR0=2 ii o g: R1:CR0=2 rf-reg ii o po i: R1:r3=3 po j: R1:r6=z po ii o ii o rf rf-reg rf-reg rf-reg L1: lwz r2,0(r5) mpwi r2,1 bne L1 stw r3,0(r6)

(27)

equal. Thus, sin e ontrol registers signalsequality, the bran h is not taken andthenextinstru tiontobeexe utedisthestorestw,asshownbythearrows

h

po

→ i

,

h

po

→ j

and

h

po

→ b

. 4.2 Globality of rfmaps

Runningtests

i

5

and

i

6

onaPowerma hineyieldsthespe iedout omes. Thus, we onsiderthe

r

and

rfe

relationsnottobeglobalforPower.

4.3 Preserved program order

ppo

Somepartsofthe

po

programorderrelationareree tedintheglobal happens-beforerelation. Inthisse tionwepi kthoseout,deninga preserved-program-orderrelation

ppo

.

Data dependen ies Data dependen ies within a pro essor arise from any ombination of the reads-from relation on registers and the intra-instru tion ausalityrelation: dd

→ ,

(

rf-reg

→ ∪

ii o

→)

+

(here

R

+

denotesthetransitive losure of

R

). Note thatthis relationin ludes no dependen iesviamemory.

Therestri tionoftheabovetomemoryeventsiswritten dd-mem

: dd-mem

,

dd

→ ∩ (M × M)

Control dependen ies A memorywrite is ontrol dependent ona memory read if there is an intervening ommit (of a onditional bran h) that is data-dependentontheread andpre edes(in programorder)thewrite:

r

trl

→ w

, ∃c ∈ C. r

dd

→ c

po

→ w

Thisrelationmodelsthatfa t thatmemorywritesarenotspe ulated,whereas reads anbe.

Isyn dependen ies Amemoryeventisisyn -dependentonamemoryread if there exists an intervening ommit (of a onditional bran h) that is data-dependent on the read and is separated (in program order) by an isyn from theevent:

r

isyn

→ m

, ∃c ∈ C. r

dd

→ c ∧ c

fen ed(isyn )

m

Notethat thisonlyaddsanythingbeyond ontroldependen iesinthe aseofa memoryread/readpair.

Figure 7givesanexample of a ontrol dependen y, from load

a

to store

b

through ommit

h

.

(28)

Alltogether Thepreservedprogramorderrelationisjusttheunionofdata dependen yformemoryevents, ontroldependen ies,andisyn dependen ies:

ppo

,

dd-mem

trl

→ ∪

isyn

Notethatmemorystoresareneverthesour eofa ppo

pair,asa onsequen e of the instru tion semanti s: a memory store annot be the sour e of a

ii o

pair,norofa

rf-reg

pair. Thus,anaturalpartitionof ppo

pairsisintoload/load pairsandload/storepairs. Wein ludethesepairsintheglobalhappens-before relation:

ˆ forload-loadpairs: wereferto [5,pp. 653668℄,whi hstatesthatinsu h asituation, load

r

1

will be performed before load

r

2

withrespe t toany pro essor,whi hweinterpretas

r

1

beinggloballyperformedbefore

r

2

. It isnot leartous whether ornotthese notionsareequivalentin the ase ofaload,orifourinterpretationmakessensew.r.tar hite turalinsights; ˆ for load-store pairs: wededu e from

r

ppo

→ w

that

r

happens before the storeisinitiated. Sin ewe onsiderloadstobeatomi thus onsidering theyaregloballyperformedassoonastheyareinitiatedand storesto beinitiatedbeforebeinggloballyperformed,wededu ethat

r

isglobally performedbefore

w

is, and in lude su h

ppo

pairsinthe global happens-beforerelation.

4.4 Values do not ome out of thin air

Intheappli ationofourframeworkforPower,load-storepairsendurea parti u-lartreatment: weextendload-store

ppo

edgesbyfollowing rf

ones,thusdening anotherrelation

ppo-ext

,whi hwewillalsoin ludeintheglobalhappens-before relation. Consideratriple

r

ppo

→ w

rf

→ r

,where

r

doesnotneedtooriginatefromthe same pro essoras

r

and

w

. Here

r

happens before thestore is initiated, and

r

happensafter thestoreisinitiated. Thus,intuition suggeststhat

r

globally happensbefore

r

. Wedenetheextension of ppo

by rf

asfollows:

r

ppo-ext

r

, ∃w, r

ppo

→ w

rf

→ r

Ourextension of load-storedependen iesis ageneralization ofsome he k thatweakmodelsoftenandarguablymustin orporate:the ausality he k of [10℄, or the values do no not ome out of thin air of [7℄. The anoni al exampleofsu ha he kisasfollows:

P

0

P

1

(a)

r1

x

(c)

r1

y

(b)

y

r1

(d)

x

r1

Wefurtherassumethat

x

and

y

initially ontain

0

. Withoutspe i provisionin themodel,theabsurdout ome

x

= 1; y = 1

mightremainvalid,asdemontrated bythefollowingexe utionwitness:

(29)

a: R [x℄=1 b: W [y℄=1 ppo-pro : R [y℄=1 rf y=1 d: W [x℄=1 ppo-pro rf x=1

Our model invalidatesthe exe utionthanks to ppo extension. Namely, we have

a

ppo-ext

c

(by

a

ppo-pro

b

rf

→ c

)and

c

ppo-ext

a

(by

c

ppo-pro

d

rf

→ a

. Hen e, ppo

aloneis y li . Afortiori ghb

is y li ,sin e ppo

isin ludedin ghb

. Notethat we ouldpreventvaluesto omeoutofthin airbyaddinganothersanity he k onthe

rf

relationinourgeneri framework,followingAlpha'sdo umentation. Therearetworeasonsfor onsideringsu hanextensiontobeglobalthat is,in ludedintheglobalhappensbeforerelation:

ˆ it rulesoutsomeexamples in whi h valueswould appear outof thin air aspresentedin thefollowingexample;

ˆ from a global time perspe tive,

r

and

r

are ordered respe tivelybefore andafterthepointoftimewhen

w

isinitiated;formoredetails,seese tion 6where anexampleisdis ussed.

This isperhapsintuitivelyplausible,anditsu es toruleoutsome exam-ples in whi h values would appear out of thin air (we suppose here that the ar hite tureshouldruleoutthin-airreads,thoughthatmightbedebated),and to orrespondwithourPower5experimentsonthetestpresentedinse tion 6. However, su h an extension does notseem to be for ed. Therefore, it is not learto uswhetherweshould forbidthat behaviourinthemodel.

Illustration We onsider here the litmus test adir1v3 (a variation on [7, Test 1℄).

Figure8showstheprogramandanon-

SC

exe ution( hb

→ ∪

po

is y li ). One may rstnoti e the

ppo

relation betweenload

b

and store

c

. It followsfrom a datadependen y, sin etheee tiveaddressof

c

isexa tlythevalueread by

b

(i.e. theaddressoflo ation

y

). Therelation

ghb

ishighlightedwithbla kbold arrows. Wehave:

1.

b

ppo

→ c

(data dependen y), and thus

b

ppo-ext

d

(by

b

ppo

→ c

rf

→ d

and ppo-extension). 2.

d

syn

→ e

(syn instru tion), and thus

c

ab

e

(by

c

rf

→ d

syn

→ e

and A- umulativity).

(30)

P

0

P

1

(a)

x

&y

(d)

r3

y

(b)

r6

x syn

(c)

*r6

1

(e)

r4

x Initially: x=&z; Allowed: 1:r3=1;1:r4=&z; a:W [x℄=y b:R[x℄=y po:0rf rf :W[y℄=1 po:0 ppo d:R[y℄=1 ppo-ext:b- rf e:R[x℄=z A/B:d-e rf po:1 syn 1:r3=1 fr 1:r4=&z rf

Figure8: Litmustest adir1v3

3.

e

fr

→ a

,bydenition of fr

. Clearly, ghb

is a y li and the exe ution is valid. In some sense, validity followsfrom

a

rf

→ b

notbeingglobal sin ethere isa y le

a

rf

→ b

ghb

→ a

. Note thatevenifwestrengthenthe

ppo

relationbyitsextension,thisout omeisstill valid.

This rf

relationisinternaltoapro essor;by onsideringitnottobeglobal, wemodelthepresen eof astorebueronthis pro essor,whi hwesupposeto beatleastapartofthereasonwhythis behaviourisobserved.

4.5 Cumulative memory barriers

syn Thesyn barrieristhein arnationoftheSC-restoring umulativebarrier des ribedin se tion3.1,whi h denitionweexpandfor larity:

syn

→ ,

fen ed(syn )

e

1

ab-syn

e

2

,

e

1

syn

→ e

2

(base)

∨ e

1

rf

→ r

syn

→ e

2

(A- umulativity )

∨ e

1

syn

→ w

rf

→ e

2

(B- umulativity )

∨ e

1

rf

→ r

syn

→ w

rf

→ e

2

(A/B- umulativity )

lwsyn PowerPCfeaturesalightweight umulativebarrier,lwsyn ,whi h se-manti swedene asfollows:

(31)

lwsyn

→ ,

fen ed(lwsyn )

∩ ((W × W) ∪ (R × E))

e

1

ab-lw

→ e

2

,

e

1

lwsyn

→ e

2

(base)

∨ e

1

rf

→ r ∧ r

ab-lw

→ e

2

∧ e

2

∈ W

(A- umulativity )

∨ e

1

ab-lw

→ w ∧ w

rf

→ e

2

∧ e

1

∈ R

(B- umulativity )

Inotherwords,lwsyn a tsassyn ex eptonstore-loadpairs,theex eption impa tingboththebase and umulativity ase.

Finallywedenerelation ab

astheunionof ab-syn

and ab-lw

. Analogybetween ppo-ext

and B- umulativity The ppo-ext

extensionis ar-guablyananalogofB- umulativity. Herewe onje turethatbarriersimplement A- umulativity by waiting for somestores to beperformed globally, in whi h ase

ppo

ignorestheissue. By ontrast,B- umulativityonaload-store pair de-mandsnospe i a tions,asthenatural onsequen eofa

w

rf

→ r

implyingthat

r

isperformedonlyon e

w

isissued.

5 Barrier experiments 5.1 O ial tests

Aprogrammingnotein[3,p. 415℄des ribestwoexamplesinpre iseprose.We formulatethoseasinvalid exe utionsoflitmus tests.

isa1 Let us rst examinethe simplertest isa1, given aspseudo- odeat the topof gure9. PowerPCdo umentationstates: Cumulativeordering di tates that the value loaded from lo ation x by pro essor 2 is 1. Our interest is in ano iallyinvalidexe ution,whi hourPowermodelshouldalsodeeminvalid (bottomofgure). Thus,weinterprettheabovepres riptionasforbiddingthe valueloaded fromlo ationxbypro essor2tobe

0

, theinitial ontentsof

x

.

Torelateanexe utiongraphto alitmustest, onemayrstrelateeventsto instru tions,usingtheeventannotations(

(a)

,

(b)

,...) inprogramtextandthe po

arrowsingraphs. Forinstan e,

P

1

performsaloadfromlo ationx,reading1, (event

b

),astoreof2tolo ationy(event

c

),andthoseareseparatedbyasyn instru tion(relation

b

syn

→ c

). Otherarrowsare asfollows: dashedarrowsgive relation

rf

, with pending arrowsto read events beingloads from intial state, and pending arrowsfrom write events being stores to nal state; while bold arrowsgive relation

ghb

. Forinstan e,

e

fr

→ a

resultsfrom event

e

readingthe initial valueof lo ationx, whi h isoverwritten by event

a

. Or,

a

ab

→ d

results frombarrier umulativityby

a

rf

→ b

syn

→ c

rf

→ d

.

We an now rea h interesting on lusions quite easily: by or. 1, the ex-e ution shown is not SC, sin e there is a y le

a

rf

→ b

po

→ c

rf

→ d

po

→ e

fr

→ a

. Moreimportant,theexe utionisnotvalidinthePowermodel,sin ethereare y lesin

ghb

(32)

isa1

P

0

P

1

P

2

(a)

x

1

(b)

r1

x

(d)

r1

y syn syn

(c)

y

2

(e)

r2

x Forbidden: 1:r1=1;2:r1=2;2:r2=0; a: W [x℄=1 b: R [x℄=1 rf : W [y℄=2 A/B:b- d: R [y℄=2 A/B:b- rf po:1 syn A/B:b- 1:r1=1 rf e: R [x℄=0 A/B:d-e rf po:2 syn 2:r1=2 fr 2:r2=0 rf

Figure9: O ialyinvalidexe utionof isa1

 umulative ordering of storage a esses pre eeding a memory barrier. As a onsequen e,the y le

a

ab

→ c

ab

→ e

fr

→ a

is themostillustrative. Namely,

a

ab

→ c

followsfrom

a

rf

→ b

and

b

syn

→ c

;while

c

ab

→ e

follows

c

rf

→ d

and

d

syn

→ e

;

isa2 Themore omplextest isa2(gure 10)isarenementof isa1: a hain ofstore-readsfrom

P

0

to

P

2

thatpasses through

P

1

.

But

P

0

nowperformstwostorestolo ations

x

and

y

,separatedbyasyn ; while

P

1

loopsloading

y

untilsit readsthe value

2

written to

y

by

P

0

, before storingvalue

3

tolo ation

z

.

P

2

remainsessentiallyun hanged. Asforisa1,the

(33)

isa2

P

0

P

1

P

2

(a)

x

1 L1:

(f )

r3

z syn

(c, d)

r2

y syn

(b)

y

2 mp r2,2

(g)

r1

x bne L1

(e)

z

3 Forbidden: 2:r3=3;2:r1=0; a: W [x℄=1 b: W [y℄=2 po:0 syn d: R [y℄=2 A/B:a-b rf rf rf : R [y℄=0 fr po:1 e: W [z℄=3 ppo f: R [z℄=3 ppo-ext: -e ppopo:1 ppo-ext:d-e rf g: R [x℄=0 A/B:f-g rf syn po:2 2:r3=3 fr 2:r1=0 rf rf

(34)

ar hite ture spe i ation forbids that

P

2

loads value

0

from lo ation

x

(third instru tion)whenithasloaded(rstinstru tion)thevalue(here

3

)storedby

P

1

in somememorylo ationused for ommuni ating(here

z

). A keyobservation is theabsen e of abarrier in

P

1

ode. Instead, wehavea ontrol dependen y. Theexamplebeingo ial,weassumethatsu h a ontroldependen ysu es topreventthelastloadof

P

2

fromreadingvalue

0

.

Inpresen eofa onditionalbran h,thereisa leardistin tionbetween pro-gramtextandexe ution,or,morepre iselybetweenprogrammlistingorderand programorder

po

. Wesele taparti ular(invalid)exe utionwitnessgenerated by memevent, where

P

1

exe utes twoloopiterations (gure 10). The ontrol dependen yisexpressedasthetwoedges

c

ppo

→ e

and

d

ppo

→ e

. Theexe utionis non-SC,bytheexisten eof y le

a

po

→ b

rf

→ d

po

→ e

rf

→ f

po

→ g

fr

→ a

. Theexe ution is also invalidin ourPowermodel, sin e

ghb

is y li . We learlyidentify two y les:

a

ab

→ d

ppo

→ e

ab

→ g

fr

→ a

and

a

ab

→ d

ppo-ext

f

syn

→ g

fr

→ a

. Note that the ppo-ext

extension isnot neededto on lude that this out omeis invalid in our Powermodel.

5.2 Classi al tests

Inthepreviousse tion,wehavedemonstratedthatourPowermodelis orre t w.r.t. thetwoo iallitmusteststhat arepubli lyavailable. Clearly,twotests areunsu ientto drawany on lusionandweneedmore.

Some litmus tests are onventional, su h as iriw (Independant Reads of Independent Writes, gure11) and rw (Read ToWrite Causality, gure12) seeforinstan e[13℄, and[4,Example7.7℄.

Figures11 and12 shownon-SC exe utionwitnesses, whi h are theones of interest. Toseethattheexe utionswe onsiderarenon-SC,itsu estofollow

rf

,

fr

and

po

arrowsinanygraph,soastonda y le. Thesegraphsalsoshow that theexe utions onsideredareinvalidourPowermodel,bythepresen eof bold

ghb

y les.

We annot on lude from thepubli do umentation[5℄ whether these two testsareinvalidonthePowerar hite ture;thereforeweresortto experimenta-tion.

5.3 Experiments

In experiments, we observe a sele tion of the nal values of registers and of memorylo ation,yielding out omes. Inthe aseof thefourlitmus testsisa1 rw the nal values of registerswritten to by the load instru tionssu e to identify the non-SC exe ution depi ted. For instan e the out ome [1:r1=1; 1:r2=0; 2:r3=0;℄su esto identifythenon-SCexe utionof rw .

We performed experiments ontwo ma hines doko and hp x. doko is a 4- oresPower5ma hine,running Linux; while hp xisone16- oreseServer575, runningAIX.

Figure

Figure 1: i 3 exhibits non-SC behaviour on modern arhitetures
Figure 2: Event strutures for test i3.
Figure 3: Exeution witnesses for i3.
Figure 4: Invalid exeutions by uniproc .
+7

Références

Documents relatifs

Some of these factors were already in the literature and some were added by the researcher i.e., whether the words are content or function words, the length of the

planners’ background, the computer-based tools they used in work, their knowledge and experience of smart cities, and their opinions on how smart cities might

Write a degree 2 polynomial with integer coefficients having a root at the real number whose continued fraction expansion is. [0; a,

Since all push and take operations occur in a single thread, and steal operations never alter the value of b, the elements of (β n ) correspond to writes to b in program order

James presented a counterexample to show that a bounded convex set with James’ property is not necessarily weakly compact even if it is the closed unit ball of a normed space.. We

The latter behaves as a marginal-stability point and intrinsic intense 1/f noise is expected there, con-.. sistently with recent experimental

The results show that some well-known undecidability results for context-free grammars hold true even if some specific additional information is available.. We

The Health, Health Care and Insurance survey (ESPS) and the Permanent Sample of National Insurance Beneficiaries (Epas) In the 2004 Health, Health Care and Insurance survey