A Modular Framework for Verifying Versatile Distributed Systems

(1)

Any correspondence concerning this service should be sent

to the repository administrator:

tech-oatao@listes-diff.inp-toulouse.fr

This is an author’s version published in:

http://oatao.univ-toulouse.fr/24923

To cite this version: Chevrou, Florent and Hurault, Aurélie and Quéinnec,

Philippe A Modular Framework for Verifying Versatile Distributed Systems.

(2019) Journal of Logical and Algebraic Methods in Programming, 108. 24-46.

ISSN 2352-2208

Official URL

DOI :

https://doi.org/10.1016/j.jlamp.2019.05.008

Open Archive Toulouse Archive Ouverte

OATAO is an open access repository that collects the work of Toulouse

researchers and makes it freely available over the web where possible

(2)

A

modular

framework

for

verifying

versatile

distributed

systems

✩

Florent

Chevrou,

Aurélie

Hurault,

Philippe

Quéinnec

∗

UniversitédeToulouse,IRIT,2,rueCharlesCamichel,31000Toulouse,France

a b s t r a c t Keywords: Distributedsystems Asynchronouscommunication Multicast Compatibilitychecking TLA+

Putting independent components together is a common design practice of distributed systems. Besides, there exists a wide range of interaction protocols that dictate how these components interact,whichimpactstheircompatibility.However,thecommunication modelitselfalwaysconsists in a monolithic description of the rules and properties of the communication. In this paper, we proposeamechanizedframeworkfor thecompatibility checking ofcompositionsofpeers wherethe interaction protocol canbe fine tuned through assembly ofbasic properties on thecommunication. These include whether the communication is point-to-point, multicast or convergecast, which ordering-policies are to be applied, applicative priorities, bounds on the number of messages in transit, and so on. Among these properties, we focus on a generic description of multicast communication thatencompassespoint-to-point and one-to-allcommunication as special cases.The componentsthatformthecommunicationmodelarespecifiedinTLA+_,_and_a_system,_composed_of_a

communicationmodelandaspecificationofthebehaviorofthepeers (alsoinTLA+_),_is_checked_with

theTLA+ modelchecker.Eventuallyweprovidetheoreticalviews ontherelationsbetween ordering-policiesthroughthelensesofmulticastand convergecastcommunication.

1. Context

1.1. Introduction

Distributedsystemsareacompositionofindividualcomponents,thepeers,thatexchangemessagesandworktowardsa common goal.Theirinteractionsaregovernedbyaprotocol,orcommunicationmodel,thatspecifieswhethertheemission or thereception ofamessage is possible.For example, synchronouscommunication dictatesthat amessage shall besent and received atthe same time (rendez-vous). In asynchronous communication, though, which thispaper focuses on, the emission and the reception ofa message do not happen simultaneously: the two events occur with a delay.This results in many possible interleavings of the communication events, some of which might jeopardize the compatibility or the correction of a composition of peers unless specific properties on the communication are met. Such properties include whetherthecommunicationispoint-to-point,multicastorconvergecast,numerousmessage-orderingpoliciesthatstatethat some messageshavetobedeliveredintheiremissionorder,boundson thenumberofmessagesintransit, andapplicative

✩ _This_work_was_supported_by_project_PARDI_{ANR-16-CE25-0006.}

*

Correspondingauthor.

E-mailaddresses:florent.chevrou@enseeiht.fr (F. Chevrou),aurelie.hurault@enseeiht.fr (A. Hurault),philippe.queinnec@enseeiht.fr (P. Quéinnec)

(3)

priorities ensuringthat some messages or recipients have precedence over others. Any conjunction of these properties is a uniquecommunicationmodel.Yet,existing verification frameworksconsider theinteractionprotocoltobeanindivisible entitythatmaybe,atbest,parameterized(e.g.capacityofqueues)orentirelysubstitutedbyanother.

In this paper, we describe an extensible framework where the communication model is any desired conjunction of communication properties we call “micromodels”. We allow for different combinationsto apply on different partsof the distributed system:forinstancemulticastcausallyordered communicationonsomeofthepeersandpoint-to-pointcapped FIFOorderedcommunicationonanothersubsystem.EachmicromodelisatransitionsystemspecifiedinTLA+_whose transi-tionsaccountforanemissionoradeliveryofamessageandwhosestatesmayfitanyconvenientdatastructure,nomatter how therest ofthe communicationis described.For instance, a simple specification ofthe micromodel correspondingto the property“there areat most n messages intransit” is a set in which amessage is added after an emission,removed after a reception, and that prevents any further emissions when it contains n messages. As an example, it may coexist with amicromodelthat enforcesa messagedeliveryorderusingqueues. Asystemto verifyconsistsoftheproduct ofthe micromodelsandthebehaviorofthepeers,specifiedinTLA+_._The_correctness_of_the_system_is_checked_with_TLC,_the_TLA+ modelchecker.Thiscorrectnessisanylineartemporalproperties(safety andliveness)thatTLA+_can_express.

The contributions of thepaper are the following. We provide a library of TLA+ modules that specify the behavior of various micromodels.First, physicalmicromodelsdeal with themultiplicitiesofdelivery: point-to-pointcommunication(a message isdeliveredto onepeer),multicast communication(amessagehas several receivers)and convergecast communi-cation (apeerreceivesaset ofmessages).Onenotablecontributionisagenericspecification(inone singlemicromodel)of multicast communication that encompasses point-to-pointand one-to-all communicationasspecial cases.Combined with these physical micromodels,the frameworkincludes ninemicromodels that controlemission and reception.The complete files oftheframeworkareavailableonline [12].Lastly,thispaperincludes atheoreticalstudythat compare theexpressive power ofthemessage-orderingmicromodels.

The outlineof thispaperfollows: This introduction presents arunning example. Section 2providesan introduction to theTLA+_{specification}_language,_Section₃_presents_the_overall_design_of_our_verification_framework_and_the_modular_design of communication models,Section 4details several micromodels: a universal micromodelof communication for boththe point-to-pointandmulticastparadigms,amicromodelforconvergecastcommunication,andseveralmessage-ordering mod-elsthatareusedincombinationwiththepreviouscommunicationparadigms.Section5studiestherelationsbetweenthese message-ordering modelswithmulticastandconvergecastcommunication.Section 6exploresrelatedwork, andeventually Section7sumsthisworkupandpavesthewayforfurtherdevelopments.

1.2. Runningexample:aconferencereviewingsystem

Asarunningexample,wepresentaconferencereviewingsystem.Thissystemiscomposedofpeersthataretheauthors, thechairsoftheprogramcommitteeandthereviewers.AuthorssendtheirpaperstoallthePCchairs.Anauthorcansubmit only onepaper.Eachchairpersonattributesapapernumberandtakes responsibilityforapartofthepapers,basedonthis number. Afterthedeadlinehas passed,thechairsreject newsubmissionsandinform theauthors.Afterthedeadline,each chair independently sends his papers to some reviewers, waits for the reviews, and sends the acceptance result to the author.Thesystemmustensurethatitdoesnotdeadlockandthateveryauthoreventuallyreceivesauniqueanswer(either rejection foralatepaper,oracceptanceresultifreviewed).

Fig.1showsapossible execution:three authorssend theirpaperon channel submissiontoallthechairs (multicast communication–blackplainarrows).Chair1willhandletheoddmessages(i.e.theonefromauthors1and3) andchair 2 will handle the even messages (i.e. the one from author 2). The chairs forward the paper that they handle on channel

papertoasetoftwoorthreereviewers(multicastcommunication–bluedashedarrows).Paper1(fromauthor1)issent bychair1toallthreereviewers,andpaper2(fromauthor2)issentbychair2toreviewers1and3.Author3submitsher paper afterthedeadline andis rejectedbychair1. The chairswait on channel reviewfor atleast tworeviewsbypaper (convergecastcommunication–greendotted-dashedarrows),andsendonchannelacceptationtheacceptance resultto theauthors(point-to-pointcommunication–reddottedarrows).

2. TLA+_{specification}_language

TLA+_[₂₃_{] is}_a_formal_{specification}_language_based_on_untyped_{Zermelo-Fraenkel}_set_theory_for_specifying_data_structures, and on the temporal logic of actions (TLA) for specifying dynamic behaviors.TLA+ _allows _to _specify _symbolic _transition systemswithvariablesandactions. TheTLA+ _toolbox_contains_the_TLC_model_checker _(an_enumerative_{explicit-state}_model checker), theTLAPS proof assistant, and various tools such asa translatorfor the PlusCalAlgorithm Language [24] intoa TLA+ specification.

Expressions rely on standard first-order logic,set operators, and several arithmetic modules. Hilbert’s choice operator, written aschoosex ∈ S : p, deterministicallypicks anarbitrary valuein S which satisfies p, providedsuch avalue exists (its valueisundefined otherwise).

Functionsareprimitive objectsinTLA+_,_and _tuples_are_a_particular_kind_of_function._The_application _of_function_f _to_an expressione iswrittenasf [e].Theset offunctions whosedomainisX andwhoseco-domainisasubsetofY iswritten as [X → Y ].The expressiondomainf is thedomainofthefunction f. Theexpression [x ∈ X 7→ e] denotesthefunction

(4)

Fig. 1. An execution for the reviewing example. (For interpretation of the colors in the figure(s), the reader is referred to the web version of this article.)

with domain X that maps any x ∈ X to e. The notation [f except![e₁_]₌e₂_] is a functionthat is equalto thefunction

f except at point e1, where its value is e2. Records are functions with domain the names ofthe fields. As a shorthand,

r [“field”]is writtenr .field,and [rexcept_{!.field = val ]}is anewrecordthat is equalto r except forthefield field which gets val. In theval expression, @ can beused to refer to theinitial value of thefield: [r except_{!.count =}@₊1_]means

[r except_{!.count = r .count +}1_].

Modules are used to structure complex specifications. A module contains constant declarations, variable declarations, and definitions.Amodulecan extend othermodules,importing alltheir declarationsand definitions.Amodulecanalsobe aninstantiation ofanothermodule. ThemoduleMI =1 instanceMwithq₁←e₁,q₂←e₂. . .isaninstantiationofmodule

M,where eachsymbol qi isreplacedby ei (qi are identifiersspecifying constants orvariablesof moduleM, and ei are expressions). ThenMI !x referencesthesymbolx oftheinstantiatedmodule.

Otherthanconstantandvariabledeclarations,amodulecontainsdefinitionsintheformOp(arg1,. . . ,argn) =1 exp.This defines thesymbol Op such that Op(e1,. . . ,en) equalsexp, where eachargi isreplaced byei. In caseof no argument, it is written as Op =1 e. A definitionis just an abbreviation or syntacticsugar for anexpression, and never changes its meaning.

ThedynamicbehaviorofasystemisexpressedinTLA+_as_a_transition_system,_with_an_initial_state_predicate,_and_actions todescribethetransitions.Anactionformuladescribesthechangesofstatevariablesafteratransition.Inanactionformula,

x denotes thevalue ofavariable x in theoriginstate, and x′ _denotes _its_value_in _the_destination_state. _A_prime _is_never used to distinguishsymbolsbut alwaysmeans “in thenextstate”.enabledA isapredicatewhich istrueinastateiffthe action Aisfeasible,i.e.thereexistsanextstatesuchthatA istrue.

Aspecificationofasystemiswrittenas_{Init ∧ ✷[Next]}vars∧ F,whereInit isapredicatespecifyingtheinitialstates,✷ is the temporal operator that asserts that the formula following it is always true, Next is the transition relation, usu-ally expressed as a disjunction of actions, [Next ]vars is defined to equal Next ∨ vars′=vars (Next with stuttering), and F expresses fairness conditions. Fairnessis usually expressed as aconjunction ofweak or strong fairnesson actions WFvars(A1)∧WFvars(A2). . . ∧SFvars(Ai). . .. Weakfairness WFv(A)means that either infinitely many A stepsoccur or

A isinfinitely oftendisabled.Inotherwords,anAstepmusteventuallyoccurif Aiscontinuouslyenabled.Strongfairness SFv(A)meansthateitherinfinitelymanyAstepsoccurorA iseventuallydisabled forever.Inotherwords,anA stepmust eventuallyoccurif Aisrepeatedlyenabled.

Systempropertiesarespecifiedusinglineartemporallogic(LTL). _✷φ meansthat φ holdsineverysuffixofthebehavior.

✸φ isdefinedtoequal¬✷¬φand meansthatφ eventuallyholdsinasubsequentstate.ψ❀ φ isdefinedtoequal_{✷(ψ ⇒}

✸φ) andmeansthat, wheneverψ holds,thenlaterφ holds. 3. Overviewoftheverificationframework

The goal of the framework is to check the compatibility or the correction of a composition of peers under specific properties onthecommunication.The keyfeatureisastrict separationofconcernsbetween thespecificationofthepeers and thespecificationofthecommunicationproperties.So, thedistributed systemconsistsintheproductoftwo transition systems:thecompositionofpeersandthecommunicationmodel.Botharelabeledbylocalizedcommunicationeventsand

(5)

are written in separate TLA+ _modules _that _are _connected _during _the _verification _process _carried _out _by_model _checking usingTLC.

Inthissection, theinitialpresentation onlyconsiders point-to-pointand multicastcommunicationto avoidintroducing toomanyconcepts.Convergecastcommunicationwillbeaddedlater(Section4.2)withminorchanges.

3.1. Specificationofacompositionofpeers

The specification of a compositionof peers is a TLA+ _module _that _describes _the_state _of _each _peer _in _the_distributed system and specify their behavior according to transition predicates (actions). There is no restriction on the design of the specification of thecomposition aslong as there is at most one communication action (send, receive or ignore) per transition. The actions in the composition usually consist in a conjunction of a communication action and a local state change.Inpractice,thestateofthecompositionisusuallyavectorofeverypeer’sstateand theactionsarelocal.Duringan action in thesystem, thestate ofa peerevolves either spontaneously oralongside a communicationaction. Nevertheless, the frameworkdoesnot forbidthat some stateisshared byseveral peers,or thatthe peersevolves synchronously.It’sup to thedesignertodecide if theonly exchangesbetweenpeersoccur withtheavailablecommunicationactions, orifsome hiddencommunicationchannelsareused.

3.1.1. Communicationactions

Theavailablecommunicationactionsprovidedbyourframeworkfollow.

Send. send(sender,receivers,channel,data) isenabledwhentheemissionofamessagebysender onchannelchannel ispossible. We use thechannel asan indirection on the notion of destination peer(point-to-point) ordestination group (multicast). Besides, it makes it possible to specify systems where channels are not statically associated to a given sender and to a given group ofreceivers. receivers restrictthe set of possible receivers for this message: it is usually the set of all peers since channels dynamically account for thedestination or destinationgroup but itmay be used to narrowa possible set of receiversdown orto send amessage to anexplicitdestination. Eventually,thepayload of themessage isdata without

restriction onitstypewhichcanbeadapted onacase-by-casebasis.Thispayload isretrievedatdelivery.

Receive. receive(receiver,channel, data) is enabled when the reception of a message by receiver on channel channel that

contains data is possible. We assume peers cannot prevent a delivery based on the content data of the message: the communicationmodelimposesthemessage tobereceivedandthecontentisonly availableafterwards.Therefore,in prac-tice,areceiveactioninthespecificationofacompositionhastheform∃data ∈ DATATYPE : receive(_,_,data)∧P (data)

where P (data) isa transitionpredicate that covers allthepossible valuesof data in DATATYPE.This means that thenext stateofthereceivermaydependondatabuttheenablednessofthereceptionitselfisindependentofthisvalue.

Ignore. ignore(peer,channels) is always enabled. It statesthat peer does not expectto receivemessages from the channels in channels anymore.The channels apeer has not ignoredis calledtheinterest of thispeer. Ignoringa channel cannotbe revertedasthiswouldotherwisebreaksdeliveryorderings:ignoringachannelandlatergettinginterestedinitagainwould allow to temporallybypass themessage dependencies.The interest iscrucial tothe specificationof some communication propertiesincludingmulticastcommunicationasdetailedlaterinSection4.1.1.

3.1.2. Backtothereviewingsystem

Thereviewingsystemhas beendescribedinthePlusCalAlgorithmLanguage,whichistranslatedbytheTLA+ toolstoa TLA+ _{specification.}_An_excerpt1_is_given_in_Fig.₂_.

The peers (processes in PlusCal language) are the authors, the chairs and the reviewers. Only the reviewers are described in theexcerpt.Thereviewers havetwo actions:theycaneither receiveamessage onchannel paperorsend a message onchannel review.Foramessageto bereceivedonthechannel paper,thereviewermustnothavemorethan 4 papers to review, and thepaper isadded tohis list ofpapers toreview. For a message tobe sent on review, itmust concernapaperthereviewerhavetoreview,and thepaperisremovedfromhislistofpaperstoreview.

3.2. Specificationofacommunicationmodel

A communicationmodel is responsible for collecting messages sent bythe peers, and delivering them to the relevant peers.Inourframework,itisacombinationofinstancesofmicromodels,eachcorrespondingtoasubsetofchannelsofthe system.Wewillfirstseethestructureofthemicromodelsbeforeexplaininghowthedifferentmicromodelsinteracttoform thecommunicationmodel.

(6)

modulereviewing

....

–algorithmreviewing

....

fair processReviewer ∈ IdReviewers

variable

readinglist = {}; – for each reviewer, the papers he has to review begin

rl0: – listen only on channel “paper”

awaitignore(self , CHANNELS \ {“paper”});

rl1:

while true do

either – receive a paper to review

awaitCardinality(readinglist ) ≤4 ;

withpaper ∈ IdPapersdo

awaitCOM !receive(self ,“paper”, paper );

readinglist := readinglist ∪ {paper }; end with ;

or – send a review to the chairs

withpaper ∈ readinglistdo

awaitCOM !send (self , IdChairs,“review”, hself , paper i);

readinglist := readinglist \ {paper }; end with ;

end either ; end while ; end process end algorithm

Fig. 2. Thepeers(processesinPlusCallanguage)aretheauthors,thechairsandthereviewers.Thereviewershavetwoactions:theycaneitherreceive amessageonchannelpaperorsendamessageonchannelreview.Thechairsandtheauthors(notshownhere)haverespectivelyfiveandtwoactions.

3.2.1. Micromodelsofcommunication

As just stated, a communication model is a combination of communication properties we call micromodels. A micro-modelhas toanswerthefollowingtwoessentialquestionsfromwhichsixotherquestionsarederived:

q1) Whenistheemissionofamessage, onagiven channel,byagiven peer,possible? q2) Whenisthedeliveryofamessage,on agiven channel,toagivenpeer,possible?

Inordertoaddressthese questions,thespecificationofamicromodel,aTLA+ _module,_relies_on _its_current_state. q3) Whichinformationmustthestatecarry?

Besides, amicromodel canbeparameterizedbyconstants in themodule. Forexample, a micromodelcorrespondingto theproperty“thenumberofmessagesintransitiscapped”hasaparameter:thebound,and itsstateisthesetofmessages intransit.Anemissionrequiresthecardinalityofthissetnottoexceedthelimitand adeliveryisalwayspossible.Thesole purposeofthismicromodelistolimitthenumberofmessagesintransitanditimposesnoconstraintonthedeliveries:the basicsofthecommunicationsuchas“amessagemusthavebeensentbeforeitisdelivered”arepartofanothermicromodel involved alongside.Micromodelsarecomplementarywithminimumoverlap.

Theremainingquestionsarethen: q4) Whatistheinitialstate?

q5) Howdoesthestateevolveafteranemission? q6) Howdoesthestateevolveafteradelivery?

q7) Howdoesthestateevolveaftersomechannelsareignoredbyapeer?

Since we aim at modeling both point-to-point and multicast communication, the answer to the last two questions is not trivial.Consider a micromodel that specifies either point-to-pointor multicast communication and let us combine it with our example cap micromodel, characterized by a set of messages in transit. When performing a reception in this micromodel, theresulting statedepends on thecommunicationparadigm: thedelivered message mustberemoved when thecommunication ispoint-to-point(themessage isnotin transitanymore)but theset maybeleftunchanged whenthe communicationismulticast (themessageremainsintransit forfurtherdeliveries).Wetherefore distinguishtwo classesof micromodels:physicalandnon-physical.Physicalmicromodelsspecifywhenamessageisremovedfromthecommunication modelbecauseitcannolonger bereceived.Non-physical modelsspecifypredicatesthat controlthesendingand receiving

(7)

modulemessage_cap

extends_{Naturals, FiniteSets}

constants_{ID, PEERS , CHANNEL, BOUND} _{Maximum nb of messages in transit}

PhysicalMicromodel,false q8

The state consists of one field: the ids of the messages in transit.

TypeInvariant (s), s ∈ [idInTransit :subsetID] q3

Init, [idInTransit 7→ {}] q4

usedIds(s), s.idInTransit

preSend (s, id , from, to, channel , data), q1

Cardinality(s.idInTransit) < BOUND postSend (s, id , from, to, channel , data), q5

[sexcept!.idInTransit = s.idInTransit ∪ {id }] preReceive(s, id , to, channel , data),true q2

postReceive(s, id , to, channel , data, remove), q6 ifremovethen[sexcept!.idInTransit =@\ {id }]else s postIgnore(s, peer , chan_set , removedIds), q7

[sexcept!.idInTransit = s.idInTransit \ removedIds]

Fig. 3. TLA+_module_of_a_{parameterized}_micromodel_that _caps_the_number_of_messages_in_transit._The_annotations_q1_to_q8_indicate_the_answers_to_the

questionsamicromodelhastoaddress.

of messages but are not concerned by the lifetime of a message. This information is fed to non-physical models by the physicalmodelssotheycanevolveinaconsistent way.

q8) Is themicromodelphysical?

The specification of any micromodel, suchas our example (message cap) whose TLA+ _{specification} _is _in _Fig. ₃_, _must answereachoftheeightquestionsq1toq8.Theanswertoq8isabooleanPhysicalMicromodel;q1andq2arepredicates

preSend and preReceive thatdependon thecurrentstateofthemicromodel,thesenderorreceiver,thechannel, andthe data containedin the message; q3 is a type predicate TypeInvariant depending on the current state s; q4 is the value

Init of theinitialstate; q5, q6, and q7are thevaluespostSend, postReceive, postIgnore ofthe stateafter theoperation.

postSend and postReceive share the interface of preSend and preReceive, postIgnore depends on a peer and set of channels to ignore. Additionally, in thespecification of non-physical micromodels, postReceive has an additional boolean parameter remove stating whether thereceivedmessage shouldbe removedor keptin transit, and postIgnore has aset

removedIds ofmessagestoremove.

3.2.2. Assemblyofacommunicationmodel

The followingdetails anexample communicationmodel whose structure issummedup inFig. 4. This assemblystates that, among channels a, b, c, d, e, and f, thecommunicationhas thepropertyof a given micromodel on channels a, b, and c (say a messageordering property) and that anotherproperty(hence anotherinstance ofa micromodelsuch asthe message capmicromodel)isassociated tochannels c and d. Overlapsarepossible:communicationonchannel c hasboth the message ordering and the message cap properties. Amicromodel can also be instantiated more than once: the first micromodelcanbeinstantiatedagain onchannelse andf whichwouldmeanmessagesone andf areordered,messages on a,b,and c areordered,butthereisnoguaranteeontheorderingofamessageofthefirstgroup andamessageofthe secondgroup.

As stated earlier, a physical micromodel dictates when a message no longer exists in the communication model (e.g. after the first delivery if the physical micromodel is point-to-point communication) and the information is used by the non-physical micromodels to update their local state. This implies that every channel must be associated to exactly one physical micromodel. Especially, the sets of channels of physical micromodels mustnot overlap. Otherwise, two physical micromodelscoulddisagreeon whethertoremoveamessageonasharedchannel. However,therestriction doesnotapply to non-physical micromodels: the sets of channels may overlap or extend beyond the domains of physical micromodels. Given a communication model that is part point-to-point, part multicast, it is possible to limit the number of messages in transit on the whole communication model with a message cap instance that encompasses the domains of both the point-to-pointandmulticastphysicalmicromodels.

The architecture of the reviewing system is given in Fig. 5. As an author writes to all the chairs, the communication is multicast on channel submission. Achair forwardsthe papers it handles to allthe reviewers,the communicationis alsomulticaston channelpaper.Achairwaits forallthereviewsbeforedecidingonacceptance, sothecommunicationis convergecastonchannelreview.Theacceptanceresultisamessagebetweenachairandanauthorsothecommunication is point-to-pointonchannel acceptation. Inaddition, orderingofdeliveryisrequired toensurethat thechairsgetthe

(8)

Fig. 4. A communication model built as a combination of micromodels. Each channel is associated to a unique physical micromodel.

Fig. 5. Communication model for the reviewing system example, built as a combination of micromodels.

modulereviewing

constant_{NbAuthors, NbChairs, NbReviewers,}

NbMinReviews, NbMaxReviews, Capacity CHANNELS, {“submission”,“paper”,“review”,“acceptation”}

COMMODELS, {

[name 7→“multicast”, params 7→ [chan 7→ {“submission”}, min 7→1, max 7→ NbChairs]],

[name 7→“multicast”, params 7→ [chan 7→ {“paper”}, min 7→ NbMinReviews, max 7→ NbMaxReviews]], [name 7→“convergecast”, params 7→ [chan 7→ {“review”}, min 7→ NbReviews, max 7→ NbReviews]], [name 7→“p2p”, params 7→ [chan 7→ {“acceptation”}]],

[name 7→“fifon1”, params 7→ [chan 7→ {“submission”}]],

[name 7→“voting”, params 7→ [chan 7→ {“submission”}, bound 7→1]],

[name 7→“message_cap”, params 7→ [chan 7→ CHANNELS , bound 7→ Capacity]]} COM,instancemulticomwith

PEERS ← IdAuthors ∪ IdChairs ∪ IdReviewers, COM ← COMMODELS ,

CHANNEL ← CHANNELS

....

Fig. 6. An excerpt of the conferencereviewing system. COMMODELS specifies the properties of the channels (e.g. submission is a multicast 1-NbChairschannel).

papers in thesameorder and assign the samenumber to a paper(FIFOn–1 onchannel submission), and aconstraint on sent messages exists to limit the number of submissions byan author (voting on channel submission). Eventually, to limitexplosion whenmodel checking,a messagecapis set toallthechannels. Note that thisboundedmodel-checking techniqueistobeusedonlytofindbugs,asitrestrictsthecheckedexecutions.

3.2.3. Interlinkingofthemicromodels

The TLA+ module that exposes the three communication operations available for the specification of compositions of peers is called the “multicom”. It is instantiated in the specification of the composition of peers with a parameter: the specification of thecommunication model. The module parameters the desired layout of micromodels of communication and instantiates the resulting communication model which then enables the peersto interact and exchange information. Fig. 6 gives the set up of the communication model of the reviewing example where a communication model COM is instantiatedaccordingtoalayoutofmicromodelsdescribed inCOMMODELS,incompliancewithFig.5.

The multicomisa dispatcherthat gathersthelocal statesofthemicromodels,checkswhether anoperationispossible (using thepre · · · predicates),and howthelocalstatesevolve(using thepost · · · values).Themulticom alsogeneratesand manages the message identifiers: a message has the same identifieracross all themicromodels which makesit possible to maintain coherence. When an operation is to be performed, say a reception on channel c, the conjunction of all the

preReceive predicatesofmicromodelsassociatedto c determineswhetherthereceptionispossible. Ifso, thenewstateof thephysical micromodelofc iscomputed.Bycomparingitto theformer state,theset ofmessagesidentifiersthatareno longer inuse(i.e.removedmessages)is computed.Itisprovidedtothenon-physicalmicromodelswhose stateisupdated afterwards.

(9)

Fig. 7. Illustrationofthedispatcherroleofthemulticom.Anoperationonchannelc isinitiatedbya peerofacomposition.Itcorrespondstoaunique atomicTLA+_action._The_{communication}_model_is_described_in_Fig.₄_:_for_channel_c_,_it_does_not_involve_micromodel_2._The_conjunction_of_the_guards_on

theoperationdetermineswhethertheoperationispossible.Ifso,itisappliedonthephysicalmicromodelfirstandthenontheotherswithknowledgeof theremovedmessages.

modulemulticom receive(peer , chan, data),

∃ id ∈ id .in_use : There exists a message,

∧ ∀ com ∈ COM : that is receivable in all the micromodels the channel is associated to.

chan ∈ com.params.chan =⇒ ComPreReceive(com, id , peer , chan, data) ∧letphysicalCom, Find the physical micromodel associated to the channel.

choosec ∈ COM : c ∈ {com ∈ COM : chan ∈ com.params.chan ∧ ComPhysical (com)} in _{Compute the next state of the physical micromodel.}

letpostPhysicalState, ComPostReceivePhysical(physicalCom, id, peer, chan, data)

in _{Check if the physical micromodel decides to remove the received message.} letremove, (id /∈ ComUsedIds(physicalCom, postPhysicalState))in

∧ s′= [com ∈ COM 7→ Apply state update to all concerned micromodels. if_{chan ∈ com.params.chan}then

if_{com = physicalCom}thenpostPhysicalState

else _{ComPostReceiveNonPhysical (com, id , peer , chan, data, remove)} else _s[com]]

Update theidstate variable foridgeneration and reuse.

∧ id′_{= [id}except!.in_use = ComAllUsedNext (AllModels)]

Fig. 8. TLA+_{specification}_of_the_reception_by_a_peer_of_a_message_on_a_channel._It_finds_a_message_that_is_receivable_in_all_the_micromodels_the_channel_is

associatedto,andupdatesthestateofthesemicromodels.Thephysicalmicromodeldecidesifthemessageisremovedandtheothermicromodelsgetthis informationtopurgetheirstate.

Fig.7isasequencediagramthat givesinsightintotheprocess.Notethatitispurelyillustrative:itactuallycorresponds to aunique atomicTLA+ _action, _that _is_a_transition_predicate _involving_the_conjunctions_of_the _{micromodel-specific} pred-icates. Anextract ofthe multicom moduleis shown Fig.8. It presentsthe receiveaction receive(peer ,chan,data) where

peer receives a messageon the channel chan withcontent data. It consistsinfinding amessage that isreceivable inall themicromodelsthechannel isassociatedto, andthenupdating thestate ofthesemicromodels.The physicalmicromodel decides if themessage isremoved (i.e.itis nolonger available foranother reception)and theother micromodelsgetthis informationtopurgetheirstate.

4. Micromodelsindetail

Inthissection,several micromodelsaredetailed:first twophysicalmicromodelsformulticast communicationand con-vergecast communication, and then non-physical micromodels that constrain the emission of messages or their delivery order.

4.1. Aphysicalmicromodelformulticastcommunication

Aphysicalmicromodelfor asynchronouspoint-to-pointcommunicationcanbemodeledasaset ofmessagesintransit, initiallyempty:thenetwork.Thesend actionisalwaysenabled,andaddsthemessagetothenetwork.Deliveringamessage

(10)

requires it to be in the network and removes it. Obviously, a message is delivered at most once. In order to describe multicast communicationthatallowsmultiple deliveriesofamessage(at mostone perpeer),thelifespanofamessage in transit mustbeextendedtoencompassmultipledeliverieswhilemakingitpossibletoeventuallysuppressit.

4.1.1. Lifespanofmessagesintransit

Sendingthemessagesoverandover. A simple solution would send the message again once it has been received so it can be received another timeby another peer. There aretwo problems. This solution does not specify when to stop sending messagesagain. Second,whenconsideringmessage-orderedcommunicationwheretheorderoftheemissionsmatters(e.g. messagesmustbereceivedintheiremissionorder),sendingamessageagainmightmodifytheordering.Forinstance,send

m1 followed bym2, then deliver m1. The semantics of thissolution implies that m1 is putbackin the network and the

neworderingism2·m1 insteadofm1·m2 theactualorderofthemulticastemission.

Neverremovingthemessagesfromthenetwork. Were themessages toremain inthenetwork forever,they couldbereceived asmanytimesasneeded.Onceagainhowever, thismightconflictwithsomeorderingpolicies.Assumethatmessagesmust be receivedintheiremissionorder,thatistosaythenetworkcanbeviewedasaglobalqueue,and considertwomessages intransit.Evenafterallthepeershavereceivedthefirstmessage,sinceitremainsintransitforever,noneofthemwillever receivethesecond(notfirstinqueue)and thesystemwilldeadlock.

Removingamessagefromthenetworkoncedeliveredtoallthepeers. The previous issue is overcome by removing a message from thenetworkafter ithasbeen deliveredtoevery peer.Still, thismeansthat allthepeersmustbereadytoreceiveall themessagesinordernottoblockthesystem.Thisrequirementistoostrongtoallowfortheverificationofinterestingand realistic systems:thespecificationofapeershouldnotdependonthenoiseintheenvironmentittakespartin.

Removingwhennooneisinterested. In order not to impose the deliveryof a message that a peer has nothing to do with, and neverwill,werelyontheconceptofinterest.Apeerisinterestedinsomechannelsonly:itexpectsmessagesonthese channels. Overtime,thepeermayloseinterestinsomeorallofthem:eithertheexpecteddeliverieshaveoccurredorthe peerhas ruledoutthepossibilityofeverreceivingthemessages.Actionignore ofthecommunicationmodelallowsapeer toloseinterestinagivensetofchannels,asdescribedintheprevioussection.Theinterestofthepeersispartofthestate of themulticastmicromodel.The mostsensiblebehaviorwould beto removea messagefromthenetwork assoonasthe last peer interestedinthechannel ofthismessage receives orignores it.However, amoregenericapproach isonly afew tweaksawayfromthismainrule.

4.1.2. Agenericdescriptionforpoint-to-point,multicast,andone-to-allcommunication

The proposed operational specification of multicast communication is adapted to become generic and encompass, in particular, point-to-pointcommunication.ConsidertwoparametersofthecommunicationdenotedMIN and MAX.

• MIN is theminimal number oftimes a message mustbe receivedbefore itis removed from the network when no peerisinterested;

• MAX isthemaximalnumberoftimesamessagecanbereceivedbeforeitisremovedfromthenetworkregardlessof theinterest.

Let N denote the number of peers in the system. Up until now, we have described multicast(0,N) communication: a message isremovedfromthenetworkwhenthecorrespondingchanneldoesnotinterestanypeer.

Point-to-point communicationcorrespondsto multicast(1,1).Indeed, amessage mustbereceivedatleast once before it can beremovedfromthenetwork and mustnotbereceivedmorethan once.This meansit isimmediatelyremovedfrom the network following the first reception, never before. Similarly, multicast(1,N) corresponds to multicast communication where atleastonepeermustreceiveamessagebefore itisremoved,and multicast(N,N) modelsone-to-allcommunication whereamessagemustbereceivedbyallthepeers(includingthesender)beforeitisremovedfromthenetwork,regardless oftheinterest.MIN andMAX canalsotakeanyothervaluebetween0andN.

Fig.9 illustratesthe differences betweenmulticast(0,N), multicast(1,1), and multicast(N,N) with a common example sce-nario involvinga globalmessage-ordering policy (thenetwork consistsof aglobalcommon queue of messages).It shows the possibleconstraints and deadlocksthat arisefrom combiningtwo micromodels: a variant ofmulticast, and theglobal orderingpolicy.

ThecompletespecificationoftheproposedgenericmicromodelispresentedinFig.10andconsistsoftwostatevariables

network and interest. The first one is a set of messages in transit which expands after each new emission; the second one contains, foreach peer, theset ofchannels that it has notignored (i.e.its interest).Amessage is composedof meta-data includingits unique identifier provided bythe multicom, the sender, channel, and a set receivedBy of peersit has already been deliveredto in orderto prevent multiple deliveriesto the samepeer(see preReceive). After adelivery(see

(11)

interest network

Operation p1 p2 p3 (0,N) (1,1) (N,N)

{a, b} {a, b} {a, b} ∅ ∅ ∅

p1ia {b} {a, b} {a, b} ∅ ∅ ∅ p1!a {b} {a, b} {a, b} a a a p1!b {b} {a, b} {a, b} a · b a · b a · b p2?a {b} {a, b} {a, b} a · b b a · b p2ia {b} {b} {a, b} a · b b a · b p3?a {b} {b} {a, b} a · b ⊥1 a · b p3ia {b} {b} {b} b a · b p1?b {b} {b} {b} b ⊥2

1_The_message_is_not_in_the_network_anymore₍_{MAX =}_1). 2_The_message_on_a_is_still_in_the_network₍_{MAX = N}₎_and_must

bereceivedfirstaccordingtothecurrentorderingpolicy.

Fig. 9. Evolutionofthestateofthecommunicationaccordingtodifferentinstancesofmulticast(*,*) withglobalmessage-ordering,channelsa andb,and

N =3 peers(pi)i∈1..N.Thenetworkisrepresentedbyaqueue.!means“send”,?means“receive”,imeans“ignore”.

receptionshave occurred(afterthefirstdeliveryinpoint-to-point,i.e. multicast(1,1)).When channelsareignoredbyapeer (see postIgnore), the interest is updated,and messages that no longer interest any peer areremoved from the network

unlesstheyhavenotbeendeliveredatleastMIN timesyet.

4.2. Aphysicalmicromodelforconvergecastcommunication

Convergecast communication, or N-to-1 communication, isthe dual of multicast. Where multicast sends a message to a set of peers, convergecast consists in the reception of a set of messages in one action [31,27,21]. This communication primitive is interesting as a building block for more complex architectures (e.g. join in a fork-join schema). Contrary to themulticastmicromodel,theconvergecastmicromodelcouldactuallybesimulatedwithpoint-to-pointcommunication:to receive messagesfrom aset of peers,a peerreceives them individually until ithas receivedallofthem. However, having adedicatedmicromodelisuseful,tomakeexplicitaconvergence operationinanalgorithm, andtoreducetheinterleaving in the reception.As thetransition ofmultireception only occurs whenall themessages are available,this reducesthe n!

possible executions coming from the interleavings of individual receptions of each message to one transition. Since our frameworkusestheTLCmodelcheckertoverifythecorrectness,thispointalonesufficestojustifyconvergecast.

Likemulticast,convergecastisparameterizedbyMINandMAX.Amultireceptionofn messagesisenabledifMIN ≤ n ≤ MAX.Notethatthen messagesmustcomefromn distinctpeers:amultireceptionneverdelivers,inthesameaction,two messages issued by asame peer. Point-to-point communication exactlycorresponds to convergecast(1,1). convergecast(N,N)

models all-to-one communicationwhere amessage must have been sent by allthepeers (including the receiver)for the multireception to beenabled. convergecast(1,N) allows to receivean arbitrary subsetof the messages intransit to the re-ceiver.

In addition to the send, receive and ignore communication actions, a new action is added to the multicom module: multireceive(receiver,channel,datas). Thisaction is alwaysdisabled in thepoint-to-pointand multicast micromodels.In the convergecast micromodel,itisenabled whenthereceptionbypeer receiver ofaset ofmessages sent bydistinctpeerson channel channel,and suchthat thebag holdingthecontentofthemessagesisdatas, ispossible.As forreceive,weassume peers cannot prevent a delivery based on the content of the messages, and a reception action in the specification of a compositionhastheform∃datas ∈ SubBag(. . .):multireceive(_,_,datas)∧P (datas).

The specification ofthe convergecastis presentedFig. 11. The precondition ofmultireceive takes ids a set of message identifiers. It checks that the numberof messages is inthe bound of theconvergecast micromodel, that each message is actually deliverable, thatallthemessages comefromdifferentpeers, andbuilds datas asthebag ofallmessagecontents. As in point-to-pointcommunication, areceivedmessage isremoved fromthe messagesin transit asitcan bereceivedat mostonce.

As thepoint-to-pointand multicastphysicalmicromodels,theconvergecastmicromodelisexpectedto beused in com-bination withother micromodels, such asthe capmicromodel or any of theordering micromodelsdescribed inthe next section.Forinstance,onecancombineconvergecastandFIFO1–1 (describedbelow).Inthisway,successivemultireceptions will getmessagesfrom eachpeerintheirsendorder. Inpractice,convergecastismainlyusedinafork-joinschemaand is used withoutanyorderingorwithFIFO1–1.

4.3. Constrainingmicromodels

Weprovide non-physicalmicromodelsthat limittheenabledness ofsending orreceiving amessage.These models im-plementclassicalconstraintsofcommoncommunicationmodels.First,weprovidegenericmessage-orderingmodelssuchas FIFOdelivery.Wealsoprovideanapplicativemessage-orderingmodelwhereprioritiesbetweenchannelscanbeexpressed.

(12)

1 modulemulticast

2 extendsNaturals, FiniteSets

3 constantsID, PEERS , CHANNEL, DATATYPE , MIN , MAX

4 PhysicalMicromodel_,true 5

6 localMessage, [

7 id : ID, Message identifier 8 from : PEERS , Sender

9 to :subsetPEERS , Possible receivers 10 channel : CHANNEL, Channel 11 data : DATATYPE , Payload

12 receivedBy :subset_{PEERS ]} _{Peers it has already been delivered to} 13 localNetwork,subsetMessage

14 localInterest, [PEERS →subset_CHANNEL] 15

16 TypeInvariant (s), s ∈ [network : Network, interest : Interest]

17 Init_{, [network 7→ {}, interest 7→ [peer ∈ PEERS 7→ CHANNEL]]}

18 usedIds(s), {m.id : m ∈ s.network}

19

20 postIgnore(s, peer , chan_set ),

21 letnew_peer_interest, s.interest[peer] \ chan_setin 22 [sexcept!.interest = [@ except![peer ] = new_peer_interest ],

23 !.network = {m ∈@:

24 ∨ m.channel ∈ new_peer_interest

25 ∨ Cardinality(m.receivedBy) < MIN not received enough

26 ∨ ∃ p ∈ PEERS \ {peer } : m.channel ∈ s.interest [p]}] another peer is still interested

29 Emission: the message is added to the nework 30 preSend (s, id , from, to, channel , data),true 31 postSend (s, id , from, to, channel , data),

32 [sexcept!.network =@∪ {[id 7→ id , from 7→ from, to 7→ to, channel 7→ channel ,

33 data 7→ data, receivedBy 7→ {}]}]

35 preReceive(s, id , to, channel , data),

36 ∃ m ∈ s.network : The metadata of a message in transit match.

37 ∧ m.id = id ∧ to ∈ m.to ∧ m.channel = channel ∧ m.data = data ∧ channel ∈ s.interest [to]

38 ∧ to /∈ m.receivedBy The peer has not received it yet. 40 postReceive(s, id , to, channel , data),

41 letm, (choose_{x ∈ s.network : x .id = id )}in 42 letnewm, [mexcept!.receivedBy =@∪ {to}]in 43 if∧ Cardinality(newm.receivedBy) < MAX

44 ∧ ∨ Cardinality(newm.receivedBy) < MIN

45 ∨ ∃ p ∈ PEERS : newm.channel ∈ s.interest [p]

46 then[sexcept!.network = (s.network \ {m}) ∪ {newm}] keep the message 47 else [sexcept!.network = (s.network \ {m})] drop it 48

Fig. 10. TLA+_{specification}_of_the_generic_multicast_physical_micromodel._The_parameters_MIN _and_MAX _make_it_possible_to_use_different_instances_of

thismoduletomodelmulticastcommunication,one-to-allcommunication,point-to-pointcommunication,orin-betweenvariants.

The previouslypresented micromodel that capsthenumberof messagesin transit isanotherconstraining model.A dedi-cated micromodelfor votingand bounding thenumberof sent messagesisalso specified.Thesevarious micromodelscan allbecombinedtogetherand theirdiversitydemonstratesthepowerofourframework. Newmicromodels(e.g.forcontent filtering)areeasilyspecifiedinafewlines.

4.3.1. Genericmessage-orderingmicromodels

We providenon-physicalmicromodelsfor alargeset of genericmessage-ordering policies. Adetaileddescription,both axiomatic andoperational,ofclassicpoint-to-pointcommunicationmodelsisfoundin [11].Theyincludethefollowing:

• RSC Realizable with Synchronous Communication [8,21]. The emission of a message is immediately followed by its delivery.Viewedatomically,itcorrespondstosynchronouscommunication.

• FIFOn–n Messagesaregloballyorderedand aredeliveredintheiremissionorder.

• FIFO1–n Messagessentfromasamepeeraredeliveredintheiremissionorder.

(13)

1 moduleconvergecast

2 extendsFiniteSets, Bags, Naturals

3 constantsID, PEERS , CHANNEL, DATATYPE , MIN , MAX

4 PhysicalMicromodel_,true 5

6 localMessage, [

7 id : ID, Message identifier 8 from : PEERS , Sender

9 to :subsetPEERS , Possible receivers 10 channel : CHANNEL, Channel 11 data : DATATYPE ] Payload 12 localNetwork_,subsetMessage

13

14 TypeInvariant (s), s ∈ [network : Network]

15 Init, [network 7→ {}]

16 usedIds(s), {m.id : m ∈ s.network}

17

18 postIgnore(s, peer , chan_set ), s

20 Like in point-to-point / multicast

21 preSend (s, id , from, to, channel , data),true 22 postSend (s, id , from, to, channel , data),

23 [sexcept!.network =@∪ {[id 7→ id , from 7→ from, to 7→ to, channel 7→ channel , data 7→ data]}]

25 preMultiReceive(s, ids, to, channel , datas),

26 ∧ Cardinality(ids) ∈ MIN . . MAX

27 ∧ ∀ id ∈ ids :

28 ∃ m ∈ s.network :

29 ∧ m.id = id

30 ∧ to ∈ m.to

31 ∧ m.channel = channel

32 ∧ ∀ i, j ∈ ids : i 6= j =⇒ the messages come from different peers

33 (choose_{m ∈ s.network : m.id = i).from 6= (}choose_{m ∈ s.network : m.id = j ).from} 34 ∧ datas = BagOfAll (lambda_{m : m.data, SetToBag({m ∈ s.network : m.id ∈ ids}))} 36 postMultiReceive(s, ids, to, channel , datas), the messages are removed

37 [sexcept!.network = {m ∈@: m.id /∈ ids}]

39 preReceive(s, id , to, channel , data), a single receive is allowed ifMIN =1 40 preMultiReceive(s, {id }, to, channel , SetToBag({data}))

41 postReceive(s, id , to, channel , data),

42 postMultiReceive(s, {id }, to, channel , SetToBag({data}))

43

Fig. 11. TLA+_{specification}_of_the_generic_convergecast_physical_micromodel._The_distinctive_feature_with_regard_to_{p2p/multicast}_is_{preMultiReceive which}

checksthatthenumberofmessagesiscorrect,thattheyareintransit,thattheycomefromdifferentpeers,andwhichconstructsthebagofthemessage payloads.

• FIFO1–1 Messagesbetween acouple ofpeersaredelivered intheiremissionorder. Messagesfrom/to differentpeers areindependently delivered.

• causal Messagesaredeliveredaccordingtothecausalityoftheiremission [22].Ifamessagem1iscausallysent before

amessage m2 (i.e.there exists a causal path from thefirst emission to the second one), then apeer cannot get m2

beforem1.

Thecommunicationmodelsin [11] areonlyfor point-to-pointcommunication.Moreover theyarestandalone,including themanagement ofthelifespan ofmessagesintransit. Theyhave beenrewritten toobtain specificationsoftheirordering policiesthat followthepreviousconventionsaspluggable,multicast-ready andconvergecast-readymicromodelsthat make useoftheconceptofinterestandrelyonmessagehistories.TheFIFOn–n micromodelisshowninFig.12.

4.3.2. Applicativemessage-orderingmicromodel

Wealso provideamicromodelwhere prioritiesareassigned tochannels, insteadoforderingthedeliverieswith regard to theemissionevents. Ifachannel a has ahigherprioritythan achannelb, thentheexistence ofamessage ona blocks the delivery ofany message on b, for the samereceiver. These messageswill become deliverable only after the message on a has been received. Aclassicuse canbefound in abortion messages.If the communicationmodel allowsthesystem to take other messages over the abortion one, thisresults in a seemingly unresponsive behavior to abortion or presents securityissues.

(14)

modulefifonn

constants_{ID, PEERS , CHANNEL}

PhysicalMicromodel,false localMessage, [ id : ID, from : PEERS , to : subset_{PEERS ,} channel : CHANNEL, history :subset_ID] localNetwork,subsetMessage

TypeInvariant (s), s ∈ [network : Network] Init, [network 7→ {}]

usedIds(s), {m.id : m ∈ s.network}

postIgnore(s, peer , chan_set , removedIds), s preSend (s, id , from, to, channel , data),true

postSend (s, id , from, to, channel , data),

[sexcept!.network =@∪ {[id 7→ id , from 7→ from, to 7→ to, channel 7→ channel , history 7→union{m.history ∪ {m.id } : m ∈ s.network }]}] preReceive(s, id , to, channel , data),

∃ m ∈ s.network :

∧ m.id = id ∧ to ∈ m.to ∧ m.channel = channel

∧ ¬∃ m2∈ s.network : m2.id ∈ m.history there is no preceding message in transit

postReceive(s, id , to, channel , data, remove),

ifremove

then[sexcept!.network = {[mesexcept!.history =@\ {id }] : mes ∈ {mes2∈ s.network : mes2.id 6= id }}]

else s

preMultiReceive(s, ids, to, channel , datas), ∀ id ∈ ids : ∃ m ∈ s.network :

∧ ¬∃ m2∈ {mm ∈ s.network : mm.id /∈ ids} : m2.id ∈ m.history postMultiReceive(s, ids, to, channel , datas, remove),

[sexcept!.network = {[mesexcept!.history =@\ ids] : mes ∈ {mes2∈ s.network : mes2.id /∈ ids}}]

Fig. 12. TLA+_Module_of_the_FIFO_{n–n Micromodel.}_A_message_is_deliverable_(preReceive)_if_its_history_does_not_contain_another_message_in_transit,_which

mustbedeliveredbefore.

An extract of the TLA+ _model _is _shown _Fig. ₁₃_. _Priorities _are _modelled _by_a _set _of _channel _pairs _that _{parameterizes} the micromodel (BLOCKS constant): ha,bi∈BLOCKS means that a has ahigher priority than b. Areception of a message on channel is enabled (preReceive) if there is no message for the same peer on another channel with a higher prior-ity.

4.3.3. Messagecapmicromodel

This micromodel ensures that the number of messages in transit is capped by an upper bound. It was presented in Section 3anditsTLA+ _{specification}_is_in_Fig.₃_.

4.3.4. Votingmicromodel

The votingmicromodellimits thenumberofmessagesapeer cansend onaset ofchannels duringanexecution.Once apeerhasreachedthelimit,itssendactiononthese channelsispermanently disabled.Whilethemessagecap micromodel

disables sending forallpeersandbylookingatthecurrentnumberofmessagesintransit, thevoting micromodeldisables sending perpeerand bytakingintoaccountitspastactions.Thismodelisespeciallyusefultoimplement votingbysetting the limitto 1:no peer cansend amessage (i.e.vote)morethan once on theconfiguredchannels. Anotheruseis tolimit a cyclicbehaviortooccur aboundednumberoftimes.This reducesthestate spaceand accelerates modelcheckingofthe system. Assaidearlier,notethatthisboundedmodel-checkingtechniqueistobeusedonlytofindbugs, andsomeliveness propertiesmaybecomeinvalidonthesefiniteexecutions.

Thevoting module isshowninFig.14. Itconsistsinkeepingastatethatcounts thenumberofsentmessagesper peer (fieldsent).Thisstateisusedtoallowsend(preSend)andisthenupdated(postSend).

(15)

modulepriority

constants_{ID, PEERS , CHANNEL, BLOCKS}

BLOCKS isasetofchannelpairs:hca,cbimeansthatamessageonca blocksthedeliveryofamessageoncb.

localMessage, [id : ID, to :subset_{PEERS , channel : CHANNEL]}

Init, [network 7→ {}] . . .

preSend (s, id , from, to, channel , data),true

postSend (s, id , from, to, channel , data),

[sexcept!.network =@∪ {[id 7→ id , to 7→ to, channel 7→ channel ]}] preReceive(s, id , to, channel , data),

∃ m ∈ s.network :

∧ ¬∃ m2∈ s.network : there is no other message in transit for this peer with a higher priority

∧ to ∈ m2.to ∧ hm2.channel , m.channel i ∈ BLOCKS postReceive(s, id , to, channel , data, remove),

ifremovethen[sexcept!.network = {mes ∈@: mes.id 6= id }]else s . . .

Fig. 13. TLA+_{Module of the priority micromodel (extract).}

modulevoting

constants_{PEERS , CHANNEL, BOUND} _{Maximum number of sent messages per peer}

PhysicalMicromodel,false

TypeInvariant (s), s ∈ [sent : [PEERS → Nat]] Init, [sent 7→ [peer ∈ PEERS 7→0]]

usedIds(s), {}

postIgnore(s, peer , chan_set , removedIds), s preSend (s, id , from, to, channel , data),

s.sent[from] < BOUND

postSend (s, id , from, to, channel , data), [sexcept!.sent = [@ except![from] =@+1]] preReceive(s, id , to, channel , data),true

postReceive(s, id , to, channel , data, remove), s preMultiReceive(s, ids, to, channel , datas),true

postMultiReceive(s, ids, to, channel , datas, remove), s

Fig. 14. TLA+_{Module of the voting micromodel.}

4.4. Analysisoftheexample

Gettingbackto ourintroductoryexample, let’sconsider indetail itsdescription.Ituses fourchannels: submission from

authors to chairs (multicast to all, and voting with bound 1 to limit the number of submissions), paper from chairs to reviewers(boundedmulticast basedonthenumberofexpectedreviews),review from reviewerstochairs(convergecast),and

acceptation from chairs to authors (point-to-point). Additionally, all chairs need to attribute the same number to a given paper, and without internal coordination between thePC chairs, theauthors mustuse atotally ordered multicast so that thepapersaredeliveredinthesameordertoallthechairs.AsdemonstratedinSection5.3.3,thisisachievedwiththeFIFO n–1 orderingmodelonthesubmission channel.

Theverifiedpropertiesarebothsafetyand liveness:

• Safety:oneauthorishandledbyexactlyonechair:

lethandledAuthors

(

pres

)

= {

1 papers

[

pres

][

id

].

author

:

id

∈

domainpapers

[

pres

]}

in

(16)

Table 1

Numberoftransitions&distinctstatesforthereviewingexample.

cap = 1 cap = 2 cap = 3 cap = 4

2 authors, 2 chairs, 2 reviewers 4151 / 1962 63481 / 24204 599625 / 166481 2498881 / 560994 2 authors, 2 chairs, 3 reviewers 26191 / 12820 694385 / 232776 6970035 / 1737452

2 authors, 2 chairs, 4 reviewers 158289 / 68996 4726501 / 1394440 47312453 / 10664656 3 authors, 2 chairs, 2 reviewers 86819 / 50200 9382271 / 3530626

• Liveness:Any submissioneventuallygets anacceptation/rejectionand theauthorsterminate:

AuthorsGetAnswer

= ∀

1 i

∈

IdAuthors

:

✸

(

pc

[

i

] =

“Done”

)

Thissystemexposesbothstrictorderingconstraints(submissionssenttothechairs),andhighinterleaving(eachreviewer isindependently handlingthepapersithas received).Duringthedevelopmentofthesystem, severalbugswerefound. For instance, the logictosplit the papers among thechairswas faulty withan oddnumberof chairsand some authors were neverreceivingtheiracceptanceresult;insome cases,thesamepaperwas senttwicetothereviewers,and anunfortunate (but legal) interleaving in thereception of the reviewsled to two acceptance messages to the sameauthor. Thissystem, albeitsimple, alreadyexperiencesenoughcommunicationinteractionstowarrantformalverification.

InTable1,wepresentsomeresultsobtainedfromrunningTLC,theTLA+_model_checker._The_message_cap_on_the_number ofmessagesintransitisinstrumentaltoavoid stateexplosionasitensuresthatmessagesarenotdelayedfortoolong. 5. Propertiesofmulticastandconvergecastcommunication

Thissectionpresentsresultsthatallowtocomparemodels.First,theseresultsareessentialtosubstitutability,theability to replaceonemodel withanother, withouthaving toredo theproofs.Wesaythat acommunicationmodelM1 isstricter

than a communication model M2 if M1 cannot deliver more messages than M2, or conversely, if any message that M1

delivers is also deliverable in M2. Thus, for any system using asynchronous communication, a safety property which is

proved with M2 is necessarilytrue when substituting M2 with M1. Liveness properties are also preserved if the stricter

model does not cause more blocking. For instance, FIFO1–1 does not block more than unordered communication and liveness properties are preserved, while RSC doesn’t allowtwo consecutivesend events without a receive event between them,and thusasystemmaydeadlockwithRSC whileprogressingwithamoreliberalmodel.

Secondly, these results help in differentiating the models. For instance FIFOn–1 is a model where each peer has an input mailbox,and senders add messages to it. It issometimes labelled plainly as asynchronous and confused with FIFO 1–1.Itisactuallystricterthancausal communication.Moreover,itinducestotally-orderedcommunication:twoindependent multicastswillbedeliveredinthesameordertoallthecommonpeers,withoutanyadditionalcoordination.Itsdualmodel

FIFO1–n, where each peer has an output mailbox where it puts messages for the senders to retrieve, is peculiar: it is incomparable tocausal exceptinpoint-to-pointcommunication,anditdoesnotinducetotally-orderedcommunication.

Providing hierarchies ofmessage-ordering policies helps developerswith substitutability and gives thema better intu-ition oftheir properties. In the following we recall thehierarchies for point-to-point communicationand givethree new hierarchies:formulticast communication,fortotally-orderedcommunicationand forconvergecastcommunication.

5.1. Formalspecification

Tostudytherelationsbetweenthemodels,thesetofexecutionsofeachmodelisformallydefined.

5.1.1. Specificationofexecutions

Consider a set of messages M and a set of peers P, let E , {s(p,m)|p ∈ P ∧ m ∈ M }∪ {r (p,m)|p ∈ P ∧ m ∈ M }∪ {mr (p,mm)|p ∈ P ∧ mm ∈P (M )}(where _{P (M )}isthepowerset ofM)betheset ofcommunicationevents:the disjointunionofthesetofsend,receiveandmultireceiveevents.

Anexecution

σ

isafiniteorinfinitesequenceofeventssuchthatamessageissentatmostonce,nomessageisreceived morethanonce onthesamepeer,andareceiveeventofamessageisprecededbyasendeventofthismessage:

∀

p

∈

P

: ∀

m

∈

M

: ∀

j

,

k

∈

dom

(

σ

) :

∀

p′

∈

P

:

σ

j

=

s

(

p

,

m

) ∧

σ

k

=

s

(

p′

,

m

) ⇒

j

=

k

(

a

)

∧

σ

j

=

r

(

p

,

m

) ∧

σ

k

=

r

(

p

,

m

) ⇒

j

=

k

(

b

)

∧

σ

j

=

r

(

p

,

m

) ⇒ ∃

i

∈

dom

(

σ

) : ∃

p′

∈

P

:

σ

i

=

s

(

p′

,

m

) ∧

i

<

j

(

c

)

(1)

Additionally, if multireception (convergecast) is allowed, each of the received messages is received at most once, the multireception must be preceded bythe emission events of each message, and for mr (p,mm), all messages come from distinct peers:

(17)

∀

p

∈

P

: ∀

j

,

k

∈

dom

(

σ

) :

(2)

∀

p′

∈

P

: ∀

mm

,

mm′

∈ P (

M

) : ∀

m

∈

M

:

m

∈

mm

∧

m

∈

mm′

∧

σ

j

=

mr

(

p

,

mm

) ∧

σ

k

=

mr

(

p′

,

mm′

) ⇒

j

=

k

(

a

)

∧ ∀

mm

∈ P (

M

) : ∀

m

∈

mm

:

σ

j

=

mr

(

p

,

mm

) ⇒ ∃

i

∈

dom

(

σ

) : ∃

p′

∈

P

:

σ

i

=

s

(

p′

,

m

) ∧

i

<

j

(

b

)

∧ ∀

mm

∈ P (

M

) :

σ

j

=

mr

(

p

,

mm

) ⇒ |{

p′

∈

P

: ∃

m

∈

mm

, ∃

i

∈

dom

(

σ

) :

σ

i

=

s

(

p′

,

m

)}| = |

mm

|

(

c

)

Point-to-point communication adds an additional constraint on executions as specified by (1): no message isreceived morethanonce (whereasmulticastcommunicationimposesthat nomessageisreceivedmorethan onceonthesamepeer):

∀

p

,

p′

∈

P

: ∀

m

∈

M

: ∀

j

,

k

∈

dom

(

σ

) :

σ

j

=

r

(

p

,

m

) ∧

σ

k

=

r

(

p′

,

m

) ⇒

j

=

k (3)

5.1.2. Specificationofthegenericorderingmicromodels

Each communication model is characterized by the set of executions it allowsto unfold. For instance, theset of exe-cutions of FIFO n–n contains alltheexecutions suchthat if areception happens before another, thetwo emissions ofthe messages musthave happened inthe sameorder. The genericmessage-ordering properties ofthe micromodelsdescribed in4.3.1arespecifiedasadditionalconstraintsonexecutions (equations (1),(2) and (3)):

• RSC (RealizablewithSynchronousCommunication)

∀

m

∈

M

, ∀

p1

,

p2

∈

P

, ∀

i

,

j

∈

dom

(

σ

) :

σ

i

=

s

(

p1

,

m

) ∧

σ

j

=

r

(

p2

,

m

) ⇒ (

j

=

i

+

1

)

(4) • FIFOn–n

∀

m

,

m′

∈

M

, ∀

p1

,

p1′

,

p2

,

p2′

∈

P

, ∀

i

,

j

,

k

,

l

∈

dom

(

σ

) :

σ

i

=

s

(

p1

,

m

) ∧

σ

j

=

s

(

p₁′

,

m′

) ∧

σ

k

=

r

(

p2

,

m

) ∧

σ

l

=

r

(

p₂′

,

m′

)

⇒

¡(

i

<

j

) ⇔ (

k

<

l

)

¢

(5) • FIFO1–n

∀

m

,

m′

∈

M

, ∀

p1

,

p2

,

p2′

∈

P

, ∀

i

,

j

,

k

,

l

∈

dom

(

σ

) :

σ

i

=

s

(

p1

,

m

) ∧

σ

j

=

s

(

p1

,

m′

) ∧

σ

k

=

r

(

p2

,

m

) ∧

σ

l

=

r

(

p₂′

,

m′

)

⇒

¡(

i

<

j

) ⇔ (

k

<

l

)

¢

(6) • FIFOn–1

∀

m

,

m′

∈

M

, ∀

p1

,

p₁′

,

p2

∈

P

, ∀

i

,

j

,

k

,

l

∈

dom

(

σ

) :

σ

i

=

s

(

p1

,

m

) ∧

σ

j

=

s

(

p₁′

,

m′

) ∧

σ

k

=

r

(

p2

,

m

) ∧

σ

l

=

r

(

p2

,

m′

)

⇒

¡(

i

<

j

) ⇔ (

k

<

l

)

¢

(7) • FIFO1–1

∀

m

,

m′

∈

M

, ∀

p1

,

p2

∈

P

, ∀

i

,

j

,

k

,

l

∈

dom

(

σ

) :

σ

i

=

s

(

p1

,

m

) ∧

σ

j

=

s

(

p1

,

m′

) ∧

σ

k

=

r

(

p2

,

m

) ∧

σ

l

=

r

(

p2

,

m′

)

⇒

¡(

i

<

j

) ⇔ (

k

<

l

)

¢

(8)

• Causal. Thismodel isthemost peculiarasituses Lamport’swell-known causal order [22],denoted ≺,and defined as thereflexivetransitiveclosureofs(p,m)≺r (p′_,_m)_(reception_is_caused_by_emission)_and _local_order_on _peer.

∀

m

,

m′

∈

M

, ∀

p1

,

p₁′

,

p2

∈

P

, ∀

i

,

j

,

k

,

l

∈

dom

(

σ

) :

σ

i

=

s

(

p1

,

m

) ∧

σ

j

=

s

(

p₁′

,

m′

) ∧

σ

k

=

r

(

p2

,

m

) ∧

σ

l

=

r

(

p2

,

m′

)

⇒

¡(

σ

i

≺

σ

j

) ⇒ (

k

<

l

)

¢

(9)

5.2. Correlationwiththemicromodelsoftheframework

The communicationmodels aredescribed bythe executions theyunfold, and theyare specifiedinTLA+ _in_the frame-work. Wenowprovethat bothdescriptionsareequivalent,i.e. thatthemodelsintheframeworkarecorrectand complete withregardtotheexecution-basedspecifications.Thecorrectnessmeansthatalltheexecutionsgeneratedbytheframework respect theconstraints (1)+((4)-(9) dependingofthe model)(+(2) for convergecast,+(3) forpoint-to-point).The complete-ness means that, for a set of peers, the framework generates all the possible executions which conform to (1)+((4)-(9) depending on themodel)(+(2) for convergecast,+(3) for point-to-point)and tothepeersbehaviors,without omittingany ofthem.