Fr´ed´eric Gava
L.A.C.L
Laboratoire d’Algorithmique,Complexit´e etLogique
Cours du M2 SSI option PSSR
1 S´emantique naturelle
2 Contrˆole de flot et primitives BSP
3 Communications et r`egles globales
4 Exercices
1 S´emantique naturelle
2 Contrˆole de flot et primitives BSP
3 Communications et r`egles globales
4 Exercices
1 S´emantique naturelle
2 Contrˆole de flot et primitives BSP
3 Communications et r`egles globales
4 Exercices
1 S´emantique naturelle
2 Contrˆole de flot et primitives BSP
3 Communications et r`egles globales
4 Exercices
D´eroulement du cours
1 S´emantique naturelle
2 Contrˆole de flot et primitives BSP
3 Communications et r`egles globales
4 Exercices
S´emantique naturelle
Ensemble de r`egles pour d´ecrire le comportement des programmes
Appel´e aussi s´emantique `a grand-pas
On d´ecrit l’´evaluation du programme du d´ebut jusqu’`a la fin On construit donc un ”arbre” repr´esentant l’´evaluation Si erreur, alors on ne peut appliquer de r`egle (construire l’arbre)
La m´ethode contraˆıre est la s´emantique `a petits-pas : Ensemble de r`egles pour d´ecrire une ´etape `a la fois de r´eduction d’un programme
On applique les r`egles jusqu’`a ne plus pouvoir r´eduire (erreur ou fin du programme)
Souvent, on mixe grand-pas et petit-pas ...
Avantages grand-pas, inconv´enients
Plus simple `a la lecture (parfois non)
Mieux adapat´e `a la preuve de programmes (parfois non) Mieux adapat´e `a la preuve de propri´et´es du langage (parfois non)
Souvent, plus de r`egles que la petit-pas
Trop de d´etails `a d´ecrire dans la s´emantique petit-pas Les erreurs ou les programmes divergents n’apparaˆıssent pas bien en grand-pas
Parfois, la petit-pas est trop compliqu´e `a faire
En gros, pas l’une meilleure que l’autre, tout d´epend du contexte...
Induction et Co-Induction
Inference system
An inference system is a set of inference rules which are ordered pairs Pc, wherec is the conclusion of the rule andP the set of its premises. The intuitive interpretation of this rule is that the judgement c can be inferred from the set of judgements P.
Interpretation
In the proof-theoretic approach, the inductive interpretation of the inference system is the set of conclusions of well-founded
derivations, while the co-inductive interpretation is the set of conclusions of arbitrary derivations (ill or well founded).
Principe de preuve
1 Induction principle (Pc) : to prove, proceed by structural induction over well-founded derivations. That is, show that c is the conclusion of a derivationd, assuming for all
conclusions j of the strict subderivations ofd.
2 Co-induction principle ( P
c ) : to prove that all judgements are in the co-inductive interpretation, build a system of recursive equations between derivations, with unknowns (xj).
Each equation is of the form : xj = xj1 xj2 · · ·
c and must
be justified by an inference rule. These equations are guarded : infinite but rational.
Le micro-langage
G´en´eralit´es
Our core language is the classical IMP with a set Expof
expressions (booleans, integers, matrix, etc.) with operations on them. Set X of variables is a subset ofExp with two special variables : pidandnprocs.
Syntaxe
c ::= skip Null command
| x:=e Assignment
| c1;c2 Sequence
| ifethenc1elsec2endif Conditional
| whileedocdone Iteration
| declarey:=ebegincend New variable withx,y∈X ande∈Exp.
Le micro-langage (2)
Definitions
Ei is the store (memory as a mapping from variables to values) of processor i and Ri is the set of received values.
Expressions
Expressions are evaluated to values v (subset of Exp) and we write : Ei,Ri |=i,p e⇓v withp the number of processors and i the pid. We suppose that Ei,Ri |=i,p pid⇓i andEi,Ri |=i,p nprocs⇓p.
Evaluation ofExp is not total (ex. evaluation of 1 +true) but for simplicity always terminates.
Le micro-langage (3)
Parallel primitives
| sync Barrier of synchronisation
| push(x) Registers a variable x for global access
| pop(x) Delete x from global access
| put(e,x,y) Distant writing ofx to y of processor e
| get(e,x,y) Distant reading fromx toy
| send(x,e) Sending value of x to processor e
Remarques
In contrast to a real BSP library, we use basic values instead of arbitrary buffer addresses (char∗).Exp is extended with findmsg(i,e) that finds theeth message of processor i of the previous super-step and nmsg that returns the arity ofRi (i.e.
number of received values).
Notations
Environement
Our semantics is a set of inference rules. We note E[x/v] insertion or substitution in E of a new binding from x tov. We noteR the received values of the previous super-step and C communications that need to be done in the current super-step.
R`egles
Finite evaluations are noted ⇓ and infinite ones are noted⇓∞ (this has to be read as “program diverges”).
D´eroulement du cours
1 S´emantique naturelle
2 Contrˆole de flot et primitives BSP
3 Communications et r`egles globales
4 Exercices
D´efinitions
We note ⇓l for local reductions (e.g. one at each processor).
Local final configurations are :
end of computation hE′,C′,R′,skipi
intermediate local configuration hE′,C′,R′,SYNC(c)i where c is the sequence of next instructions for the next super-steps.
We note x a variable that has been registered for global access (DRMA), x for the contrary and x when that is not important.
R`egles inductives (1)
E,C,R |=i,p c1⇓lhE′,C′,R′,SYNC(c)i E,C,R |=i,pc1;c2⇓lhE′,C′,R′,SYNC(c;c2)i
E,C,R |=i,pc1⇓lhE1,C1,R1,skipi E1,C1,R1|=i,pc2⇓lhE2,C2,R2,Flowi E,C,R |=i,pc1;c2⇓lhE2,C2,R2,Flowi
E,C,R |=i,pskip⇓lhE,C,R,skipi
E,R |=i,pe⇓v x∈ E E,C,R |=i,px:=e⇓l hE[x/v],C,R,skipi
whereFlow=skiporSYNC(c)
R`egles inductives (2)
E,R |=i,pe⇓true E,C,R |=i,pc1⇓lhE′,C′,R′,Flowi E,C,R |=i,pifethenc1elsec2endif⇓lhE′,C′,R′,Flowi
E,R |=i,pe⇓false E,C,R |=i,pc2⇓lhE′,C′,R′,Flowi E,C,R |=i,pifethenc1elsec2endif⇓lhE′,C′,R′,Flowi
E,C,R |=i,p ifethen(c1;whileedoc1done)else skip endif⇓l hE′,C′,R′,Flowi E,C,R |=i,pwhileedoc1done⇓lhE′,C′,R′,Flowi
E,R |=i,p e⇓vandx6∈ E E[x/v],C,R |=i,pc1⇓lhE′,C′,R′,Flowi E,C,R |=i,p declarex:=ebeginc1end⇓l hE′\ {x},C′,R′,Flowi
whereFlow=skiporSYNC(c)
R`egles co-inductives (1)
E,C,R,|=i,p c1⇓∞l E,C,R,|=i,pc1;c2⇓∞l
E,C,R |=i,pc1⇓l hE1,C1,R1,skipi E1,C1,R1|=i,pc2⇓∞l E,C,R |=i,pc1;c2⇓∞l
E,R |=i,p e⇓true E,C,R |=i,pc1⇓∞l E,C,R |=i,pifethenc1elsec2endif⇓∞l
E,R |=i,p e⇓false E,C,R |=i,pc2⇓∞l E,C,R |=i,pifethenc1elsec2endif⇓∞l E,C,R |=i,p ifethen(c;whileedocdone)else skip endif⇓∞l
E,C,R |=i,p whileedocdone⇓∞l E,R |=i,pe⇓v andx6∈ E E[x/v]|=i,pc⇓∞l
E,C,R |=i,pdeclarex:=ebegincend⇓∞l
R`egles des routines BSP (1)
E,C,R |=i,psync⇓l hE,C,R,SYNC(skip)i
E,R |=i,p e⇓pidand{x7→v} ∈ E withC′=C ∪ {send,pid,v} and 0≤pid<p E,C,R |=i,psend(x,e)⇓lhE,C′,R,skipi
E,R |=i,pe1⇓pid E,R |=i,pe2⇓n {pid,n,v} ∈ R and 0≤pid<p E,R |=i,p findmsg(e1,e2)⇓v
n=|R|
E,R |=i,p nmsg⇓n
R`egles des routines BSP (2)
if{x7→v} ∈ EwithE′=E[x/v]
E,C,R |=i,ppush(x)⇓lhE′,C,R,skipi
if{x7→v} ∈ EwithE′=E[x/v]
E,C,R |=i,ppop(x)⇓l hE′,C,R,skipi
E,R |=i,pe⇓pid and{x7→v} ∈ E and{y7→v′} ∈ E withC′=C ∪ {put,pid,y,v} and 0≤pid E,C,R |=i,pput(e,x,y)⇓lhE,C′,R,skipi
E,R |=i,p e⇓pidand{x7→v} ∈ Eand{y7→v′} ∈ E withC′=C ∪ {get,pid,x,y} and 0≤pid E,C,R |=i,p get(e,x,y)⇓l hE,C′,R,skipi
D´eroulement du cours
1 S´emantique naturelle
2 Contrˆole de flot et primitives BSP
3 Communications et r`egles globales
4 Exercices
Global Reductions (1)
BSP programs are SPMD ones so a program c is started p times.
We model this as a p-vector of c with its environments of execution that is store E, communicationsC and received values R. A final configuration is skipon all processors. We note the full evaluation :
hhE0,C0,R0|=i,pc0k · · · kEp−1,Cp−1,Rp−1|=i,pcp−1ii
⇓g
hhE0′,C0′,R′0,skipk · · · kEp−1′ ,C′p−1,R′p−1,skipii
Global Reductions (2)
The reductions ⇓g call the local (sequential) ones⇓l with the two following rules :
∀i Ei,Ci,Ri|i,p= ci⇓lhEi′,C′i,R′i,skipi
hhE0,C0,R0|=i,pc0k · · · kEp−1,Cp−1,Rp−1ii |i,p= cp−1ii ⇓ghhE0′,C′0,R′0,skipk · · · kEp′−1,C′p−1,R′p−1,skipii
∃i Ei,Ci,Ri|=i,pci⇓∞l
hhE0,C0,R0|=i,pc0k · · · kEp−1,Cp−1,Rp−1ii |=i,pcp−1ii ⇓∞g
that is each processor computes a final configuration or there is at least one processor that diverges.
Global Reductions (3)
∀i Ei,Ci,Ri|=i,pci⇓lhE′i,C′i,R′i,SYNC(ci′)i hh· · · kComm(Ei′,Ci′,R′i)i,p|= ci′k · · ·ii ⇓g hh· · · kEi′′,C′′i,R′′i ,skipk · · ·ii hh· · · kEi,Ci,Ri|i,p= cik · · ·ii ⇓ghh· · · kEi′′,C′′i,R′′i ,skipk · · ·ii
∀i Ei,Ci,Ri|i,p= ci⇓lhEi′,C′i,R′i,SYNC(ci′)i hh· · · kComm(E′i,C′i,R′i)|i,p= ci′k · · ·ii ⇓∞g
hh· · · kEi,Ci,Ri|=i,pcik · · ·ii ⇓∞g
Communications
The Comm function specifies the order of the messages during the communications. It modifies the environment of each processor i such that Comm(Ci′,R′i,Ei′) = (Ci′′,R′′i,Ei′′) is for BSMP as follow :
Ci′′=∅
R′′i =p−1S
j=0 nj
S
n=0
{j,n+
j
P
a=0
na,v}if{send,i,v} ∈nCj′
that is we suppose that each processorj has sentnj messages toi and thus we take the nth message (noted∈n) from this ordering set. DRMA accesses are defined as follows :
Ei′′=Ei′ 2 4
p−1
[
j=0
[y/v]
p−1
[
j=0
[y′/v′] if
({y7→v} ∈ Ej′and{get,j,x,y} ∈ Ci′ {y′7→v} ∈ Ei′and{put,i,y′,v′} ∈ Cj′
3 5
That is, first,getaccesses with the natural order of processors are done (list of substitutions) and then putaccesses finish the
Properties
Lemma
⇓l is deterministic.
Lemma
⇓g is deterministic.
Lemma
⇓l and⇓∞l are mutually exclusive.
Lemma
⇓g and ⇓∞g are mutually exclusive.
D´eroulement du cours
1 S´emantique naturelle
2 Contrˆole de flot et primitives BSP
3 Communications et r`egles globales
4 Exercices
Preuves simples
Proved that these programs diverge :
whiletruedo sync;
done
declarex:= 0begin declarey:= 1begin
push(x);push(y);
whilex<>ydo get(x,y,pid+ 1);
get(y,x,pid−1);
sync;
done end end
Preuves d’un calcul scientifique : les N -body, d´efinition
The classic N-body problem is to calculate the gravitational energy of N point masses :
E =− XN
i=1i6=j
XN j=1
mi ×mj ri −rj
To compute this sum, we show a classical parallel algorithm using a systolic loop. At the beginning, each processor contains a sub-part as a list of the N point masses in its own memory.
Principe d’une boucle systolique
1 Initially, each processor calculates the interactions among its point masses.
2 Then it sends a copy of its particles to its right-hand
neighbour, while at the same time receiving the particles from its left-hand neighbour. It calculates the interactions between its own particles and those that just came in, and then it sends a copy of its particles to its right-right-hand neighbouretc.
3 Afterp−1 super-steps, all pairs of particles have been treated and a parallel folding of these values can be done to finish the computation.
Hypoth`eses
We suppose a function pair energy that computes the local interactions.
For the parallel prefixes, we suppose that each processor binds a value in variablex and for the n-body that each processor binds a list of particles in my particles.
Code
Parallel direct prefixes : N-body computation : declarey :=pid+ 1
begin
while(y <nprocs)do send(x,y);
y :=y+ 1;
done sync;
y := 0;
while(y <pid)do x:=x+findmsg(y,0);
y :=y+ 1 done;
end
declarebuffer:=my particlesbegin declareenergy:= 0begin
declarey:= 0begin push(my particles);
while(y <nprocs−1)do
energy+=pair energy(buffer,my particles);
y :=y+ 1;
get((y+pid)modnprocs,buffer,my particles);
sync;
done;
energy:=energy+pair energy(buffer,my particles Code of prefixe for(energy);
end end
Exercices
1 En supposant pour chaque processeur, une donn´e de type float dans x, prouvez formellement que le code des pr´efixes calcul bien un pr´efixe pour les donn´ees des x (on suppose que l’op´erateur assiociatif est le +)
2 Prouvez ensuite que le code (de droite sans les pr´efixes) calcul bien une somme partielle
3 En d´eduire que le code calcul bien les N-body