S´emantique d’un micro-langage imp´eratif BSP

(1)

Fr´ed´eric Gava

L.A.C.L

Laboratoire d’Algorithmique,Complexit´e etLogique

Cours du M2 SSI option PSSR

(2)

1 S´emantique naturelle

2 Contrˆole de flot et primitives BSP

3 Communications et r`egles globales

4 Exercices

(3)

4 Exercices

(4)

4 Exercices

(5)

4 Exercices

(6)

D´eroulement du cours

4 Exercices

(7)

S´emantique naturelle

Ensemble de r`egles pour d´ecrire le comportement des programmes

Appelé aussi sémantique à grand-pas

On décrit l’évaluation du programme du début jusqu’à la fin On construit donc un ”arbre” représentant l’évaluation Si erreur, alors on ne peut appliquer de règle (construire l’arbre)

La méthode contraˆıre est la sémantique à petits-pas : Ensemble de règles pour décrire une étape à la fois de réduction d’un programme

On applique les règles jusqu’à ne plus pouvoir réduire (erreur ou fin du programme)

Souvent, on mixe grand-pas et petit-pas ...

(8)

Avantages grand-pas, inconv´enients

Plus simple `a la lecture (parfois non)

Mieux adapaté à la preuve de programmes (parfois non) Mieux adapaté à la preuve de propriétés du langage (parfois non)

Souvent, plus de r`egles que la petit-pas

Trop de détails à décrire dans la sémantique petit-pas Les erreurs ou les programmes divergents n’apparaˆıssent pas bien en grand-pas

Parfois, la petit-pas est trop compliqu´e `a faire

En gros, pas l’une meilleure que l’autre, tout d´epend du contexte...

(9)

Induction et Co-Induction

Inference system

An inference system is a set of inference rules which are ordered pairs ^P_c, wherec is the conclusion of the rule andP the set of its premises. The intuitive interpretation of this rule is that the judgement c can be inferred from the set of judgements P.

Interpretation

In the proof-theoretic approach, the inductive interpretation of the inference system is the set of conclusions of well-founded

derivations, while the co-inductive interpretation is the set of conclusions of arbitrary derivations (ill or well founded).

(10)

Principe de preuve

1 Induction principle (^P_c) : to prove, proceed by structural induction over well-founded derivations. That is, show that c is the conclusion of a derivationd, assuming for all

conclusions j of the strict subderivations ofd.

2 Co-induction principle ( P

c ) : to prove that all judgements are in the co-inductive interpretation, build a system of recursive equations between derivations, with unknowns (x_j).

Each equation is of the form : x_j = x_j₁ x_j₂ · · ·

c and must

be justified by an inference rule. These equations are guarded : infinite but rational.

(11)

Le micro-langage

Généralités

Our core language is the classical IMP with a set Expof

expressions (booleans, integers, matrix, etc.) with operations on them. Set X of variables is a subset ofExp with two special variables : pidandnprocs.

Syntaxe

c ::= skip Null command

| x:=e Assignment

| c₁;c₂ Sequence

| ifethenc₁elsec₂endif Conditional

| whileedocdone Iteration

| declarey:=ebegincend New variable withx,y∈X ande∈Exp.

(12)

Le micro-langage (2)

Definitions

Ei is the store (memory as a mapping from variables to values) of processor i and Ri is the set of received values.

Expressions

Expressions are evaluated to values v (subset of Exp) and we write : Ei,Ri |=î,p e⇓v withp the number of processors and i the pid. We suppose that Ei,Ri |=î,p pid⇓i andEi,Ri |=î,p nprocs⇓p.

Evaluation ofExp is not total (ex. evaluation of 1 +true) but for simplicity always terminates.

(13)

Le micro-langage (3)

Parallel primitives

| sync Barrier of synchronisation

| push(x) Registers a variable x for global access

| pop(x) Delete x from global access

| put(e,x,y) Distant writing ofx to y of processor e

| get(e,x,y) Distant reading fromx toy

| send(x,e) Sending value of x to processor e

Remarques

In contrast to a real BSP library, we use basic values instead of arbitrary buffer addresses (char∗).Exp is extended with findmsg(i,e) that finds theeth message of processor i of the previous super-step and nmsg that returns the arity ofRi (i.e.

number of received values).

(14)

Notations

Environement

Our semantics is a set of inference rules. We note E[x/v] insertion or substitution in E of a new binding from x tov. We noteR the received values of the previous super-step and C communications that need to be done in the current super-step.

R`egles

Finite evaluations are noted ⇓ and infinite ones are noted⇓^∞ (this has to be read as “program diverges”).

(15)

D´eroulement du cours

4 Exercices

(16)

D´efinitions

We note ⇓l for local reductions (e.g. one at each processor).

Local final configurations are :

end of computation hE^′,C^′,R^′,skipi

intermediate local configuration hE^′,C^′,R^′,SYNC(c)i where c is the sequence of next instructions for the next super-steps.

We note x a variable that has been registered for global access (DRMA), x for the contrary and x when that is not important.

(17)

R`egles inductives (1)

E,C,R |=^i,p c₁⇓lhE^′,C^′,R^′,SYNC(c)i E,C,R |=^i,pc₁;c₂⇓lhE^′,C^′,R^′,SYNC(c;c₂)i

E,C,R |=î,pc₁⇓lhE₁,C₁,R₁,skipi E₁,C₁,R₁|=î,pc₂⇓lhE₂,C₂,R₂,Flowi E,C,R |=î,pc₁;c₂⇓lhE2,C2,R2,Flowi

E,C,R |=^i,pskip⇓lhE,C,R,skipi

E,R |=^i,pe⇓v x∈ E E,C,R |=^i,px:=e⇓l hE[x/v],C,R,skipi

whereFlow=skiporSYNC(c)

(18)

R`egles inductives (2)

E,R |=î,pe⇓true E,C,R |=î,pc₁⇓lhE^′,C^′,R^′,Flowi E,C,R |=î,pifethenc₁elsec₂endif⇓lhE^′,C^′,R^′,Flowi

E,R |=î,pe⇓false E,C,R |=î,pc2⇓lhE^′,C^′,R^′,Flowi E,C,R |=î,pifethenc₁elsec₂endif⇓lhE^′,C^′,R^′,Flowi

E,C,R |=^i,p ifethen(c1;whileedoc1done)else skip endif⇓l hE^′,C^′,R^′,Flowi E,C,R |=^i,pwhileedoc₁done⇓lhE^′,C^′,R^′,Flowi

E,R |=î,p e⇓vandx6∈ E E[x/v],C,R |=î,pc₁⇓lhE^′,C^′,R^′,Flowi E,C,R |=î,p declarex:=ebeginc1end⇓l hE^′\ {x},C^′,R^′,Flowi

whereFlow=skiporSYNC(c)

(19)

R`egles co-inductives (1)

E,C,R,|=^i,p c₁⇓^∞_l E,C,R,|=^i,pc₁;c₂⇓^∞_l

E,C,R |=î,pc₁⇓l hE₁,C₁,R₁,skipi E₁,C₁,R₁|=î,pc₂⇓^∞_l E,C,R |=î,pc₁;c₂⇓^∞_l

E,R |=î,p e⇓true E,C,R |=î,pc₁⇓^∞_l E,C,R |=î,pifethenc₁elsec₂endif⇓^∞_l

E,R |=î,p e⇓false E,C,R |=î,pc₂⇓^∞_l E,C,R |=î,pifethenc₁elsec₂endif⇓^∞_l E,C,R |=î,p ifethen(c;whileedocdone)else skip endif⇓^∞_l

E,C,R |=î,p whileedocdone⇓^∞_l E,R |=î,pe⇓v andx6∈ E E[x/v]|=î,pc⇓^∞_l

E,C,R |=^i,pdeclarex:=ebegincend⇓^∞_l

(20)

R`egles des routines BSP (1)

E,C,R |=^i,psync⇓l hE,C,R,SYNC(skip)i

E,R |=^i,p e⇓pidand{x7→v} ∈ E withC^′=C ∪ {send,pid,v} and 0≤pid<p E,C,R |=^i,psend(x,e)⇓lhE,C^′,R,skipi

E,R |=î,pe₁⇓pid E,R |=î,pe₂⇓n {pid,n,v} ∈ R and 0≤pid<p E,R |=î,p findmsg(e1,e₂)⇓v

n=|R|

E,R |=^i,p nmsg⇓n

(21)

R`egles des routines BSP (2)

if{x7→v} ∈ EwithE^′=E[x/v]

E,C,R |=^i,ppush(x)⇓lhE^′,C,R,skipi

if{x7→v} ∈ EwithE^′=E[x/v]

E,C,R |=^i,ppop(x)⇓l hE^′,C,R,skipi

E,R |=^i,pe⇓pid and{x7→v} ∈ E and{y7→v^′} ∈ E withC^′=C ∪ {put,pid,y,v} and 0≤pid E,C,R |=^i,pput(e,x,y)⇓lhE,C^′,R,skipi

E,R |=^i,p e⇓pidand{x7→v} ∈ Eand{y7→v^′} ∈ E withC^′=C ∪ {get,pid,x,y} and 0≤pid E,C,R |=^i,p get(e,x,y)⇓l hE,C^′,R,skipi

(22)

D´eroulement du cours

4 Exercices

(23)

Global Reductions (1)

BSP programs are SPMD ones so a program c is started p times.

We model this as a p-vector of c with its environments of execution that is store E, communicationsC and received values R. A final configuration is skipon all processors. We note the full evaluation :

hhE0,C0,R0|=^i,pc₀k · · · kEp−1,Cp−1,Rp−1|=^i,pc_p−1ii

⇓g

hhE₀^′,C₀^′,R^′₀,skipk · · · kE_p−1^′ ,C^′_p−1,R^′_p−1,skipii

(24)

Global Reductions (2)

The reductions ⇓g call the local (sequential) ones⇓l with the two following rules :

∀i Ei,Ci,Ri|^i,p= c_i⇓lhE_i^′,C^′_i,R^′_i,skipi

hhE0,C0,R0|=^i,pc0k · · · kE_p−1,C_p−1,R_p−1ii |^i,p= c_p−1ii ⇓ghhE₀^′,C^′₀,R^′₀,skipk · · · kE_p^′−1,C^′_p−1,R^′_p−1,skipii

∃i Ei,Ci,Ri|=^i,pci⇓^∞_l

hhE0,C0,R0|=^i,pc0k · · · kEp−1,Cp−1,Rp−1ii |=^i,pcp−1ii ⇓^∞_g

that is each processor computes a final configuration or there is at least one processor that diverges.

(25)

Global Reductions (3)

∀i E_i,C_i,R_i|=î,pc_i⇓_lhE^′_i,C^′_i,R^′_i,SYNC(c_i^′)i hh· · · kComm(E_i^′,C_i^′,R^′_i)î,p|= c_i^′k · · ·ii ⇓g hh· · · kE_i^′′,C^′′_i,R^′′_i ,skipk · · ·ii hh· · · kE_i,C_i,R_i|î,p= c_ik · · ·ii ⇓ghh· · · kE_i^′′,C^′′_i,R^′′_i ,skipk · · ·ii

∀i Ei,Ci,Ri|^i,p= c_i⇓lhE_i^′,C^′_i,R^′_i,SYNC(c_i^′)i hh· · · kComm(E^′_i,C^′_i,R^′_i)|^i,p= c_i^′k · · ·ii ⇓^∞_g

hh· · · kEi,Ci,Ri|=^i,pc_ik · · ·ii ⇓^∞_g

(26)

Communications

The Comm function specifies the order of the messages during the communications. It modifies the environment of each processor i such that Comm(C_i^′,R^′_i,E_i^′) = (C_i^′′,R^′′_i,E_i^′′) is for BSMP as follow :

C_i^′′=∅

R^′′_i =^p−1S

j=0 n_j

S

n=0

{j,n+

j

P

a=0

na,v}if{send,i,v} ∈nC_j^′

that is we suppose that each processorj has sentn_j messages toi and thus we take the nth message (noted∈n) from this ordering set. DRMA accesses are defined as follows :

E_i^′′=E_i^′ 2 4

p−1

[

j=0

[y/v]

p−1

[

j=0

[y^′/v^′] if

({y7→v} ∈ E_j^′and{get,j,x,y} ∈ C_i^′ {y^′7→v} ∈ E_i^′and{put,i,y^′,v^′} ∈ C_j^′

3 5

That is, first,getaccesses with the natural order of processors are done (list of substitutions) and then putaccesses finish the

(27)

Properties

Lemma

⇓_l is deterministic.

Lemma

⇓g is deterministic.

Lemma

⇓_l and⇓^∞_l are mutually exclusive.

Lemma

⇓g and ⇓^∞_g are mutually exclusive.

(28)

D´eroulement du cours

4 Exercices

(29)

Preuves simples

Proved that these programs diverge :

whiletruedo sync;

done

declarex:= 0begin declarey:= 1begin

push(x);push(y);

whilex<>ydo get(x,y,pid+ 1);

get(y,x,pid−1);

sync;

done end end

(30)

Preuves d’un calcul scientifique : les N -body, d´efinition

The classic N-body problem is to calculate the gravitational energy of N point masses :

E =− XN

i=1i6=j

XN j=1

m_i ×m_j ri −rj

To compute this sum, we show a classical parallel algorithm using a systolic loop. At the beginning, each processor contains a sub-part as a list of the N point masses in its own memory.

(31)

Principe d’une boucle systolique

1 Initially, each processor calculates the interactions among its point masses.

2 Then it sends a copy of its particles to its right-hand

neighbour, while at the same time receiving the particles from its left-hand neighbour. It calculates the interactions between its own particles and those that just came in, and then it sends a copy of its particles to its right-right-hand neighbouretc.

3 Afterp−1 super-steps, all pairs of particles have been treated and a parallel folding of these values can be done to finish the computation.

(32)

Hypoth`eses

We suppose a function pair energy that computes the local interactions.

For the parallel prefixes, we suppose that each processor binds a value in variablex and for the n-body that each processor binds a list of particles in my particles.

(33)

Code

Parallel direct prefixes : N-body computation : declarey :=pid+ 1

begin

while(y <nprocs)do send(x,y);

y :=y+ 1;

done sync;

y := 0;

while(y <pid)do x:=x+findmsg(y,0);

y :=y+ 1 done;

end

declarebuffer:=my particlesbegin declareenergy:= 0begin

declarey:= 0begin push(my particles);

while(y <nprocs−1)do

energy+=pair energy(buffer,my particles);

y :=y+ 1;

get((y+pid)modnprocs,buffer,my particles);

sync;

done;

energy:=energy+pair energy(buffer,my particles Code of prefixe for(energy);

end end

(34)

Exercices

1 En supposant pour chaque processeur, une donné de type float dans x, prouvez formellement que le code des préfixes calcul bien un préfixe pour les données des x (on suppose que l’opérateur assiociatif est le +)

2 Prouvez ensuite que le code (de droite sans les pr´efixes) calcul bien une somme partielle

3 En d´eduire que le code calcul bien les N-body

(35)