• Aucun résultat trouvé

Parallel Code Generation of Synchronous Programs for a Many-core Architecture Amaury Graillat

N/A
N/A
Protected

Academic year: 2022

Partager "Parallel Code Generation of Synchronous Programs for a Many-core Architecture Amaury Graillat"

Copied!
114
0
0

Texte intégral

(1)

Parallel Code Generation of Synchronous Programs for a Many-core Architecture

Amaury Graillat

Supervisors: Pascal Raymond (Verimag), Matthieu Moy (LIP) Benoˆıt Dupont de Dinechin (Kalray)

Reviewers: Jan Reineke (Universit ¨at des Saarlandes)

Robert de Simone (INRIA Sophia-Antipolis, Aoste) Examiners: Anne Bouillard (ENS Paris)

Alain Girault (INRIA Grenoble) Christine Rochange (IRIT)

Thesis defense, November 16th, 2018

(2)

S PACE , W EIGHT , AND P OWER

(3)

Intro Parallelism Extraction Mapping/Scheduling Code Generation Communication Interference Real-Time NoC Evaluation Conclusion Publications References

S PACE , W EIGHT , AND P OWER

(4)

S AFETY C RITICAL S YSTEMS

Example of aircraft flight controller (control-command)

Sensors

Actuator

(altitude) Processor

feedback Pilot (stick)

I Time-critical: latency constraints are part of the specification (e.g. < some 10ms)

I Designed with eg., Synchronous Languages

(5)

Intro Parallelism Extraction Mapping/Scheduling Code Generation Communication Interference Real-Time NoC Evaluation Conclusion Publications References

T HE S YNCHRONOUS D ATA -F LOW L ANGUAGES (1/2)

pre init

Node 6 Node 1

Node 2

Node 3

Node 4

Node 5

I Lustre (academic), Scade (industrial)

I Network of nodes

(6)

Intro Parallelism Extraction Mapping/Scheduling Code Generation Communication Interference Real-Time NoC Evaluation Conclusion Publications References

T HE S YNCHRONOUS D ATA -F LOW L ANGUAGES (2/2)

Logical

i

n−1

= 1 i

n+1

= 7

o

n

= 6 o

n+1

= 14

o

n−1

= 2 ... ...

i

n

= 3

instant n-1 instant n instant n+1

t ×

i 2 o

I One execution is called a logical instant

i

n−1

i

n+1

o

n+1

o

n

o

n−1

... ...

i

n

Step

Step Step

instant n-1 instant n instant n+1

t

I Requirement: o n before i n+1 .

I Worst-Case Execution Time (WCET)

(7)

Intro Parallelism Extraction Mapping/Scheduling Code Generation Communication Interference Real-Time NoC Evaluation Conclusion Publications References

T HE S YNCHRONOUS D ATA -F LOW L ANGUAGES (2/2)

Logical

i

n−1

= 1 i

n+1

= 7

o

n

= 6 o

n+1

= 14

o

n−1

= 2 ... ...

i

n

= 3

instant n-1 instant n instant n+1

t ×

i 2 o

I One execution is called a logical instant Physical

i

n−1

i

n+1

o

n+1

o

n

o

n−1

... ...

i

n

Step

Step Step

instant n-1 instant n instant n+1

t

I Requirement: o n before i n+1 .

I Worst-Case Execution Time (WCET)

(8)

L USTRE /S CADE D ELAY O PERATOR

x 2

pre

+

e=3 e=7

init=0

e

instant 1 instant 2 instant 3

t

t t

e=1

b=1 b=4 b=9

a=1 a=2 a=3

a

init=0

d c=1 d=0 c=2 d=1 c=3 d=4 c

b

I previous operator

I Logical/functional delay

(9)

Intro Parallelism Extraction Mapping/Scheduling Code Generation Communication Interference Real-Time NoC Evaluation Conclusion Publications References

C ODE G ENERATION FOR S INGLE -C ORE

Node 1

Node 2

Node 3

Single-core processor

.c Code Generation

for each period { i = sensors();

o = main_step(i);

actuators(o);

}

Formal semantics Determinism

N1 N2 N3 ...

WCET Static schedule

Synchronous languages

Lustre, Heptagon, SCADE

Industrial use: SCADE in Airbus A380 Static schedule

o1 = N1_step(i);

o2 = N2_step(i);

returns N3_step(o1, o2);

void main_step(i) {

} Compilation

OK for sequential execution. What about parallel?

(10)

C ODE G ENERATION FOR S INGLE -C ORE

Node 1

Node 2

Node 3

Single-core processor

.c Code Generation

for each period { i = sensors();

o = main_step(i);

actuators(o);

}

Formal semantics Determinism

N1 N2 N3 ...

WCET Static schedule

Synchronous languages

Lustre, Heptagon, SCADE

Industrial use: SCADE in Airbus A380 Static schedule

o1 = N1_step(i);

o2 = N2_step(i);

returns N3_step(o1, o2);

void main_step(i) {

} Compilation

OK for sequential execution. What about parallel?

(11)

Intro Parallelism Extraction Mapping/Scheduling Code Generation Communication Interference Real-Time NoC Evaluation Conclusion Publications References

T HE K ALRAY MPPA2

Cluster (=multi-core)

Core

Properties of the Kalray MPPA2:

I Cores

No complex branch prediction Only LRU caches

I Cluster

Banked Shared-Memory (16*128ko) Independent arbiter for each memory bank.

I Network-on-Chip (NoC) between clusters Bandwidth limiter (Network calculus possible)

Performance + Predictability

(12)

O VERVIEW

Kalray MPPA2 Bostan

Node 3 Node 1 Node 2

Node 4

Node 5 Node 6

pre init

(13)

Intro Parallelism Extraction Mapping/Scheduling Code Generation Communication Interference Real-Time NoC Evaluation Conclusion Publications References

Part I: Semantics Preserving Parallelization

I Extraction of Parallelism I Mapping / Scheduling I Code Generation

I Communication Channels Implementation

Part II: Real-Time Guarantees

Part III: Evaluation and Conclusion

(14)

E XTRACTION OF P ARALLELISM

N1 N2 N3 N4 N5

N6 Dependency graph Communication and N1.c N2.c N3.c

N4.c N5.c N6.c functional code Contribution

External tool

Executable for Kalray

NoC Routing scheduling

Non-preemptive Mapping +

Code Generation System + Communication

N3 N2

N1 N4

N5 N6 pre init

Parallelism Extraction Method 1:

I In the top-level node:

I 1 node = 1 task (runnable).

I Generation of sequential code for each node

I Similar to Architecture Description Language (Prelude, Giotto) Method 2: based on fork-join:

I see manuscript

(15)

Intro Parallelism Extraction Mapping/Scheduling Code Generation Communication Interference Real-Time NoC Evaluation Conclusion Publications References

E XTRACTION OF P ARALLELISM

N1 N2 N3 N4 N5

N6 Dependency graph Communication and N1.c N2.c N3.c

N4.c N5.c N6.c functional code

scheduling Non-preemptive

Mapping +

Code Generation System + Communication

Executable for Kalray Contribution External tool

NoC Routing N3

N2

N1 N4

N5 N6 pre init

Parallelism Extraction Method 1:

I In the top-level node:

I 1 node = 1 task (runnable).

I Generation of sequential code for each node

I Similar to Architecture Description Language (Prelude, Giotto) Method 2: based on fork-join:

I see manuscript

(16)

M APPING / S CHEDULING

N1 N2 N3 N4 N5

N6 Dependency graph Communication and N1.c N2.c N3.c

N4.c N5.c N6.c functional code

scheduling Non-preemptive

Mapping +

Code Generation System + Communication

Executable for Kalray Contribution External tool

NoC Routing N3

N2

N1 N4

N5 N6 pre init

Parallelism Extraction

I Need non-preemptive static schedule I External scheduling tool

I We can easily check the schedule

(17)

Intro Parallelism Extraction Mapping/Scheduling Code Generation Communication Interference Real-Time NoC Evaluation Conclusion Publications References

M APPING /S CHEDULING : E XAMPLE

Node 3 Node 5

Node 1

Node 2

Node 4

Node 6

I Core 0: N1 ; N4 ; N6 Core 1: N2 ; N5 Core 2: N3

I Schedule checked using the dependency graph

(18)

Intro Parallelism Extraction Mapping/Scheduling Code Generation Communication Interference Real-Time NoC Evaluation Conclusion Publications References

C ODE G ENERATION

N1 N2 N3 N4 N5

N6 Dependency graph Communication and N1.c N2.c N3.c

N4.c N5.c N6.c functional code

scheduling Non-preemptive

Mapping +

Code Generation System + Communication

Executable for Kalray Contribution External tool

NoC Routing N3

N2

N1 N4

N5 N6 pre init

Parallelism Extraction

input”.

Sketch of code for core 0. for each period{

wait_inputs_N1(); // N1 N1_step();

write_outputs_N1(); wait_inputs_N4(); // N4 N4_step();

write_outputs_N4(); wait_inputs_N6(); // N6 N6_step();

write_outputs_N6();

}

(19)

Intro Parallelism Extraction Mapping/Scheduling Code Generation Communication Interference Real-Time NoC Evaluation Conclusion Publications References

C ODE G ENERATION

N1 N2 N3 N4 N5

N6 Dependency graph Communication and N1.c N2.c N3.c

N4.c N5.c N6.c functional code

scheduling Non-preemptive

Mapping +

Code Generation System + Communication

Executable for Kalray Contribution External tool

NoC Routing N3

N2

N1 N4

N5 N6 pre init

Parallelism Extraction

I Dependencies compiled into “wait for input”.

Sketch of code for core 0.

for each period{

wait_inputs_N1(); // N1 N1_step();

write_outputs_N1();

wait_inputs_N4(); // N4 N4_step();

write_outputs_N4();

wait_inputs_N6(); // N6 N6_step();

write_outputs_N6();

}

(20)

C OMMUNICATION C HANNELS

N1 N2 N3 N4 N5

N6 Dependency graph Communication and N1.c N2.c N3.c

N4.c N5.c N6.c functional code

scheduling Non-preemptive

Mapping +

Code Generation System + Communication

Executable for Kalray Contribution External tool

NoC Routing N3

N2

N1 N4

N5 N6 pre init

Parallelism Extraction

I Two kinds of communications:

- instantaneous (→)

- delayed (→)

(21)

16/48

Intro Parallelism Extraction Mapping/Scheduling Code Generation Communication Interference Real-Time NoC Evaluation Conclusion Publications References

I NSTANTANEOUS C OMMUNICATION

N1.c N2.c N3.c N4.c N5.c N6.c functional code

scheduling Non-preemptive

Mapping +

Code Generation System + Communication

Executable for Kalray Contribution External tool

NoC Routing Dependency graph Communication and

N1 N2 N3 N4 N5

N6 N3

pre init

N1 N4

N2 N6 N5

Parallelism Extraction

void write_outputs_N2() { copy(N4.in2, N2.out);

copy(N5.in1, N2.out);

}

void wait_for_inputs_N4() {

}

I Efficient hardware synchronization

(22)

16/48 I NSTANTANEOUS C OMMUNICATION

N1.c N2.c N3.c N4.c N5.c N6.c functional code

scheduling Non-preemptive

Mapping +

Code Generation System + Communication

Executable for Kalray Contribution External tool

NoC Routing Dependency graph Communication and

N1 N2 N3 N4 N5

N6 N3

pre init

N1 N4

N2 N6 N5

Parallelism Extraction

void write_outputs_N2() { copy(N4.in2, N2.out);

channel N2 N4 = true;

// + cache management notify(core N4);

copy(N5.in1, N2.out);

}

void wait_for_inputs_N4() { while(! channel N2 N4 ) {

wait();

}

channel N2 N4 = false;

while(! channel_N1_N4) { wait();

}

channel_N1_N4 = false;

// + cache management }

I Efficient hardware synchronization

(23)

Intro Parallelism Extraction Mapping/Scheduling Code Generation Communication Interference Real-Time NoC Evaluation Conclusion Publications References

C OMMUNICATION C HANNELS

N1 N2 N3 N4 N5

N6 Dependency graph Communication and N1.c N2.c N3.c

N4.c N5.c N6.c functional code

scheduling Non-preemptive

Mapping +

Code Generation System + Communication

Executable for Kalray Contribution External tool

NoC Routing N3

N2

N1 N4

N5 N6 pre init

Parallelism Extraction

I Two kinds of communications:

- instantaneous (→)

- delayed (→)

(24)

D ELAYED C OMMUNICATION

pre

i1 i2

o1 init

b o2

A

B

S A B

Transformation into a SWAP + scheduling constraints.

instant n n+1

n-1 n+1

S A X

Y B

n+1 n-1

Y X

A

B S

instant n

Constraint: A→S Constraint: B→S

Scenario 1 Scenario 2

...

A.in ...

S.in S.in

A.in

(25)

Intro Parallelism Extraction Mapping/Scheduling Code Generation Communication Interference Real-Time NoC Evaluation Conclusion Publications References

D ELAYED C OMMUNICATION

pre

i1 i2

o1 init

b o2

A

B

B

S A

Transformation into a SWAP + scheduling constraints.

instant n n+1

n-1 n+1

S A X

Y B

n+1 n-1

Y X

A

B S

instant n

Constraint: A→S Constraint: B→S

Scenario 1 Scenario 2

...

A.in ...

S.in S.in

A.in

(26)

D ELAYED C OMMUNICATION

pre

i1 i2

o1 init

b o2

A

B

A B

S

Transformation into a SWAP + scheduling constraints.

instant n n+1

n-1 n+1

S A X

Y B

n+1 n-1

Y X

A

B S

instant n

Constraint: A→S Constraint: B→S

Scenario 1 Scenario 2

...

A.in ...

S.in S.in

A.in

(27)

Intro Parallelism Extraction Mapping/Scheduling Code Generation Communication Interference Real-Time NoC Evaluation Conclusion Publications References

C ONCLUSION OF P ART I

N1 N2 N3 N4 N5

N6 Dependency graph Communication and N1.c N2.c N3.c

N4.c N5.c N6.c functional code

scheduling Non-preemptive

Mapping +

Code Generation System + Communication

Executable for Kalray Contribution External tool

NoC Routing N3

N2

N1 N4

N5 N6 pre init

Parallelism Extraction

I Semantics preserved

I Next step: Temporal Guaranties

(28)

Part I: Semantics Preserving Parallelization

Part II: Real-Time Guarantees

I Shared Memory Interference I Time-Triggered Execution Model

I Real-Time Guarantees with Network-on-Chip

Part III: Evaluation and Conclusion

(29)

Intro Parallelism Extraction Mapping/Scheduling Code Generation Communication Interference Real-Time NoC Evaluation Conclusion Publications References

I NTERFERENCE AND R EACTION T IME

Core 0

Core 1

Core 15

...

Task A

E M M

Task A Core 0

Core 1

WCET

I Single-core: WCET is sufficient

I Multi-core: WCET+interference on shared resources

= Worst-Case Response Time (WCRT).

I WCET+interference too pessimistic in the general case

exploit the execution model in the analysis

(30)

I NTERFERENCE AND R EACTION T IME

Core 0

Core 1

Core 15

...

Task A

E M

Task B M Request Access

Access Core 0

Core 1

Task A

WCET

Task B

Request

WCRT

I Single-core: WCET is sufficient

I Multi-core: WCET+interference on shared resources

= Worst-Case Response Time (WCRT).

I WCET+interference too pessimistic in the general case

exploit the execution model in the analysis

(31)

Intro Parallelism Extraction Mapping/Scheduling Code Generation Communication Interference Real-Time NoC Evaluation Conclusion Publications References

B ANKED M EMORY

Round Robin Core 1

Core 0

Bank 1 Bank 0

... ...

I Private arbiter for each bank

I Available in the Kalray MPPA2 (silicon)

(32)

B ANKED M EMORY IN K ALRAY MPPA2

Node 3 Node 5

Node 1

Node 2

Node 4

Node 6

N1

Round Robin

N4 N6

N2 N5

Bank 0 for core 0

Bank 1 for Core 1

...

...

Core 0

Core 1

I Choice: Code, input buffer, local variables are mapped in core’s bank of the core

I Interference on communication only

(33)

Intro Parallelism Extraction Mapping/Scheduling Code Generation Communication Interference Real-Time NoC Evaluation Conclusion Publications References

B ANKED M EMORY IN K ALRAY MPPA2

Node 6

Node 5 Node 4 Node 2

Node 1

Node 3

N1

Round Robin

N4 N6

N2 N5

Core 1 Core 0

Bank 0 for core 0

Bank 1 for Core 1

...

...

Core 0

Core 1

I Choice: Code, input buffer, local variables are mapped in core’s bank of the core

I Interference on communication only

(34)

B ANKED M EMORY IN K ALRAY MPPA2

Node 6

Node 5 Node 4 Node 2

Node 1

Node 3

N1

Round Robin

N4 N6

N2 N5

Communication

Bank 0 for core 0

Bank 1 for Core 1

...

...

Core 0

Core 1

I Choice: Code, input buffer, local variables are mapped in core’s bank of the core

I Interference on communication only

(35)

23/48

Intro Parallelism Extraction Mapping/Scheduling Code Generation Communication Interference Real-Time NoC Evaluation Conclusion Publications References

B ANKED M EMORY IN K ALRAY MPPA2

Node 6

Node 5 Node 4 Node 2

Node 1

Node 3

N1

Round Robin

N4 N6

N2 N5

Communication

Bank 0 for core 0

Bank 1 for Core 1

...

...

Core 0

Core 1

I Choice: Code, input buffer, local variables are mapped in core’s bank of the core I Interference on communication only

How to activate tasks?

(36)

Intro Parallelism Extraction Mapping/Scheduling Code Generation Communication Interference Real-Time NoC Evaluation Conclusion Publications References

P ROBLEM W ITH ASAP E XECUTION

Core 0

Core 1

Task A Task B

WCET A WCET B

WCET C WCET D

Task C Task D

Deadline

Core 0

Core 1

WCET C Task A

Task C

WCET B WCET A

Task B WCET D

Task D Deadline

Faster execution of A ⇒ interference between B and C.

Timing Anomaly

Solution: release dates for tasks

(time-triggered)

(37)

Intro Parallelism Extraction Mapping/Scheduling Code Generation Communication Interference Real-Time NoC Evaluation Conclusion Publications References

P ROBLEM W ITH ASAP E XECUTION

Core 0

Core 1

Task A Task B

WCET A WCET B

WCET C WCET D

Task C Task D

Deadline

Core 0

Core 1

WCET C Task A

Task C

WCET B WCET A

Task B WCET D

Task D Deadline

Faster execution of A ⇒ interference between B and C.

Timing Anomaly

Solution: release dates for tasks

(time-triggered)

(38)

P ROBLEM W ITH ASAP E XECUTION

Core 0

Core 1

Task A Task B

WCET A WCET B

WCET C WCET D

Task C Task D

Deadline

Core 0

Core 1

WCET C Task A

Task C

WCET B WCET A

Task B WCET D

Task D Deadline

Faster execution of A ⇒ interference between B and C.

Timing Anomaly

Solution: release dates for tasks

(time-triggered)

(39)

Intro Parallelism Extraction Mapping/Scheduling Code Generation Communication Interference Real-Time NoC Evaluation Conclusion Publications References

T IME -T RIGGERED E XECUTION M ODEL

Principle

Core 0

Core 1

WCRT D Release = 0 Release = 427

Release = 398 Release = 0

Task A

WCRT A WCRT B

WCRT C

Task C Task D

Task B

I Compute a static release date when data is guaranteed to be available I Time-Triggered execution prevents tasks from starting earlier

Implementation void wait_inputs_N1() {

while(time() < t_period + release_date_N1){

// wait

}

(40)

M ULTI -C ORE I NTERFERENCE A NALYSIS

WCET Dependencies Memory access Mapping

MIA Release dates of tasks WCRT

Multi-Core Interference Analysis (MIA) Tool [Hamza Rihani, RTNS 2016]

www-verimag.imag.fr/Multi-core-interference-Analysis.html

(41)

Intro Parallelism Extraction Mapping/Scheduling Code Generation Communication Interference Real-Time NoC Evaluation Conclusion Publications References

M ULTI -C ORE I NTERFERENCE A NALYSIS

WCET Dependencies Memory access Mapping

MIA Release dates of tasks WCRT

Multi-Core Interference Analysis (MIA) Tool [Hamza Rihani, RTNS 2016]

www-verimag.imag.fr/Multi-core-interference-Analysis.html

(42)

F RAMEWORK

(WCTT) N1 N2 N3

N4 N5

N6 Dependency graph Communication and N1.c N2.c N3.c

N4.c N5.c N6.c functional code

scheduling Non-preemptive

Mapping +

Code Generation System + Communication

Network Calculus Executable for

Kalray

WCET Analysis

MIA Contribution External tool

NoC Routing + Rate Attribution N3

N2

N1 N4

N5 N6 pre init

Parallelism Extraction

release dates

I WCET Analysis (Otawa, AiT)

I Computation of the release dates (MIA tool)

I Insertion in the executable.

[Submitted: Graillat A., Rihani H., Maiza C.,

Moy M., Raymond P., Dupont de Dinechin B.,

Real-Time Systems Journal]

(43)

Intro Parallelism Extraction Mapping/Scheduling Code Generation Communication Interference Real-Time NoC Evaluation Conclusion Publications References

O UR E XECUTION M ODEL

I Static scheduling

I Bare metal, no interrupt, no preemption I Banked memory mapping (1 core → 1 bank) I Time-triggered

Deterministic and WCRT guarantee

(44)

29/48 O VERVIEW

(WCTT) N1 N2 N3

N4 N5

N6 Dependency graph Communication and N1.c N2.c N3.c

N4.c N5.c N6.c functional code

scheduling Non-preemptive

Mapping +

Code Generation System + Communication

Network Calculus Executable for

Kalray Contribution External tool

NoC Routing + Rate Attribution

WCET Analysis N3 N2

N1 N4

N5 N6

pre init

Parallelism Extraction

release dates

(45)

Intro Parallelism Extraction Mapping/Scheduling Code Generation Communication Interference Real-Time NoC Evaluation Conclusion Publications References

T HE K ALRAY MPPA2’ S N O C

Core 0

Cluster A MEM

head route data

Core 0 TX buffer

North

Local

DMA East

Cluster B MEM

RX buffer head route data

Rate limiter

Router North East

flow 2

East, North, East, Local

(46)

T HE K ALRAY MPPA2’ S N O C

Core 0

Cluster A MEM

head route data

Core 0 TX buffer

North

East Local

DMA East

Cluster B MEM

RX buffer head route data

Rate limiter

Router North East, North, East, Local

flow 1

(47)

Intro Parallelism Extraction Mapping/Scheduling Code Generation Communication Interference Real-Time NoC Evaluation Conclusion Publications References

T HE K ALRAY MPPA2’ S N O C

Core 0

Cluster A MEM

head route data

Core 0 TX buffer

North

East Local

DMA East

Cluster B MEM

RX buffer head route data

flow 1 Rate limiter

Router

North Full Duplex Links

flow 2

East, North, East, Local

(48)

T HE K ALRAY MPPA2’ S N O C

Core 0

Cluster A MEM

head route data

Core 0 TX buffer

North

East Local

DMA East

Cluster B MEM

RX buffer head route data

Rate limiter

Router North

East, North, East, Local flow 1

rate =

12

B flow 2

rate =

12

B

(49)

Intro Parallelism Extraction Mapping/Scheduling Code Generation Communication Interference Real-Time NoC Evaluation Conclusion Publications References

R EAL -T IME G UARANTEES WITH N ETWORK - ON -C HIP

cluster router f1

1 2 3

4 5 6

7 8 9

[Dupont de Dinechin B., Graillat A., NoCArc 2017 ]

1. Routing

Route = sequence of directions computed by sender

Deadlock-free algorithms (XY, Hamiltonian) Algorithm with path diversity (Hamiltonian Odd Even (HOE))

2. Route Selection

Choose one static route per flow Optimize performance and fairness e.g. {f1.1, f2.2}, {f1.2, f2.2}... 3. Rate attribution

Fair attribution

e.g. {0.5, 0.5}, {1.0, 1.0}... 4. Network Calculus

Compute latency and buffer level

(50)

Intro Parallelism Extraction Mapping/Scheduling Code Generation Communication Interference Real-Time NoC Evaluation Conclusion Publications References

R EAL -T IME G UARANTEES WITH N ETWORK - ON -C HIP

cluster router f1

1 2 3

4 5 6

7 8 9

East East South

South Local

[Dupont de Dinechin B., Graillat A., NoCArc 2017 ]

1. Routing

Route = sequence of directions computed by sender

Algorithm with path diversity (Hamiltonian Odd Even (HOE))

2. Route Selection

Choose one static route per flow Optimize performance and fairness e.g. {f1.1, f2.2}, {f1.2, f2.2}... 3. Rate attribution

Fair attribution

e.g. {0.5, 0.5}, {1.0, 1.0}... 4. Network Calculus

Compute latency and buffer level

(51)

Intro Parallelism Extraction Mapping/Scheduling Code Generation Communication Interference Real-Time NoC Evaluation Conclusion Publications References

R EAL -T IME G UARANTEES WITH N ETWORK - ON -C HIP

cluster router f1

1 2 3

f2

4 5 6

7 8 9

f1.1

f2.1

[Dupont de Dinechin B., Graillat A., NoCArc 2017 ]

1. Routing

Route = sequence of directions computed by sender

Deadlock-free algorithms (XY, Hamiltonian)

Algorithm with path diversity (Hamiltonian Odd Even (HOE))

2. Route Selection

Choose one static route per flow Optimize performance and fairness e.g. {f1.1, f2.2}, {f1.2, f2.2}... 3. Rate attribution

Fair attribution

e.g. {0.5, 0.5}, {1.0, 1.0}... 4. Network Calculus

Compute latency and buffer level

(52)

Intro Parallelism Extraction Mapping/Scheduling Code Generation Communication Interference Real-Time NoC Evaluation Conclusion Publications References

R EAL -T IME G UARANTEES WITH N ETWORK - ON -C HIP

cluster router f1

1 2 3

f2

4 5 6

7 8 9

f1.1 f1.2

f2.1 f2.2

[Dupont de Dinechin B., Graillat A., NoCArc 2017 ]

1. Routing

Route = sequence of directions computed by sender

Deadlock-free algorithms (XY, Hamiltonian) Algorithm with path diversity (Hamiltonian Odd Even (HOE))

Choose one static route per flow Optimize performance and fairness e.g. {f1.1, f2.2}, {f1.2, f2.2}... 3. Rate attribution

Fair attribution

e.g. {0.5, 0.5}, {1.0, 1.0}... 4. Network Calculus

Compute latency and buffer level

(53)

Intro Parallelism Extraction Mapping/Scheduling Code Generation Communication Interference Real-Time NoC Evaluation Conclusion Publications References

R EAL -T IME G UARANTEES WITH N ETWORK - ON -C HIP

cluster router f1

1 2 3

f2

4 5 6

7 8 9

f1.1 f1.2

f2.1 f2.2

[Dupont de Dinechin B., Graillat A., NoCArc 2017 ]

1. Routing

Route = sequence of directions computed by sender

Deadlock-free algorithms (XY, Hamiltonian) Algorithm with path diversity (Hamiltonian Odd Even (HOE))

2. Route Selection

Choose one static route per flow Optimize performance and fairness

e.g. {f1.1, f2.2}, {f1.2, f2.2}... 3. Rate attribution

Fair attribution

e.g. {0.5, 0.5}, {1.0, 1.0}... 4. Network Calculus

Compute latency and buffer level

(54)

Intro Parallelism Extraction Mapping/Scheduling Code Generation Communication Interference Real-Time NoC Evaluation Conclusion Publications References

R EAL -T IME G UARANTEES WITH N ETWORK - ON -C HIP

cluster router f1

1 2 3

f2

4 5 6

7 8 9

f1.1

f2.2

[Dupont de Dinechin B., Graillat A., NoCArc 2017 ]

1. Routing

Route = sequence of directions computed by sender

Deadlock-free algorithms (XY, Hamiltonian) Algorithm with path diversity (Hamiltonian Odd Even (HOE))

2. Route Selection

Choose one static route per flow Optimize performance and fairness e.g. {f1.1, f2.2},

3. Rate attribution Fair attribution

e.g. {0.5, 0.5}, {1.0, 1.0}... 4. Network Calculus

Compute latency and buffer level

(55)

Intro Parallelism Extraction Mapping/Scheduling Code Generation Communication Interference Real-Time NoC Evaluation Conclusion Publications References

R EAL -T IME G UARANTEES WITH N ETWORK - ON -C HIP

cluster router f1

1 2 3

f2

4 5 6

7 8 9

f1.2

f2.2

[Dupont de Dinechin B., Graillat A., NoCArc 2017 ]

1. Routing

Route = sequence of directions computed by sender

Deadlock-free algorithms (XY, Hamiltonian) Algorithm with path diversity (Hamiltonian Odd Even (HOE))

2. Route Selection

Choose one static route per flow Optimize performance and fairness e.g. {f1.1, f2.2}, {f1.2, f2.2}...

3. Rate attribution Fair attribution

e.g. {0.5, 0.5}, {1.0, 1.0}... 4. Network Calculus

Compute latency and buffer level

(56)

Intro Parallelism Extraction Mapping/Scheduling Code Generation Communication Interference Real-Time NoC Evaluation Conclusion Publications References

R EAL -T IME G UARANTEES WITH N ETWORK - ON -C HIP

cluster router f1

1 2 3

f2

4 5 6

7 8 9

f1.1

f2.2

[Dupont de Dinechin B., Graillat A., NoCArc 2017 ]

1. Routing

Route = sequence of directions computed by sender

Deadlock-free algorithms (XY, Hamiltonian) Algorithm with path diversity (Hamiltonian Odd Even (HOE))

2. Route Selection

Choose one static route per flow Optimize performance and fairness e.g. {f1.1, f2.2}, {f1.2, f2.2}...

3. Rate attribution Fair attribution e.g. {0.5, 0.5},

4. Network Calculus

Compute latency and buffer level

(57)

Intro Parallelism Extraction Mapping/Scheduling Code Generation Communication Interference Real-Time NoC Evaluation Conclusion Publications References

R EAL -T IME G UARANTEES WITH N ETWORK - ON -C HIP

cluster router f1

1 2 3

f2

4 5 6

7 8 9

f1.2

f2.2

[Dupont de Dinechin B., Graillat A., NoCArc 2017 ]

1. Routing

Route = sequence of directions computed by sender

Deadlock-free algorithms (XY, Hamiltonian) Algorithm with path diversity (Hamiltonian Odd Even (HOE))

2. Route Selection

Choose one static route per flow Optimize performance and fairness e.g. {f1.1, f2.2}, {f1.2, f2.2}...

3. Rate attribution Fair attribution

e.g. {0.5, 0.5}, {1.0, 1.0}...

4. Network Calculus

Compute latency and buffer level

(58)

R EAL -T IME G UARANTEES WITH N ETWORK - ON -C HIP

cluster router f1

1 2 3

f2

4 5 6

7 8 9

f1.2

f2.2

[Dupont de Dinechin B., Graillat A., NoCArc 2017 ]

1. Routing

Route = sequence of directions computed by sender

Deadlock-free algorithms (XY, Hamiltonian) Algorithm with path diversity (Hamiltonian Odd Even (HOE))

2. Route Selection

Choose one static route per flow Optimize performance and fairness e.g. {f1.1, f2.2}, {f1.2, f2.2}...

3. Rate attribution Fair attribution

e.g. {0.5, 0.5}, {1.0, 1.0}...

4. Network Calculus

Compute latency and buffer level

(59)

Intro Parallelism Extraction Mapping/Scheduling Code Generation Communication Interference Real-Time NoC Evaluation Conclusion Publications References

1. U NICAST R OUTING

Hamiltonian Routing

9 8 7

4 5 6

3 2 1

9 8 7

4 5 6

3 2 1

f1

Up-Network Down-Network

I Deadlock-free

I Path-Diversity

(60)

1. M ULTICAST R OUTING P ROBLEM

f1

9 8 7

4 5 6

3 2 1

Source Destination

I Deadlock-free heuristic to “travelling salesman problem”:

I One route? Tree is no possible with Kalray architecture

I Hamiltonian: minimum of two routes

(61)

Intro Parallelism Extraction Mapping/Scheduling Code Generation Communication Interference Real-Time NoC Evaluation Conclusion Publications References

1. M ULTICAST R OUTING P ROBLEM

f1

9 8 7

4 5 6

3 2 1

Source Destination

I Deadlock-free heuristic to “travelling salesman problem”:

I One route? Tree is no possible with Kalray architecture

I Hamiltonian: minimum of two routes

(62)

1. M ULTICAST R OUTING : H AMILTONIAN

9 8 7

4 5 6

3 2 1

I Dual-Path + Hamiltonian without path diversity (Lin et al., 1992)

I Small path diversity, we can do better.

(63)

Intro Parallelism Extraction Mapping/Scheduling Code Generation Communication Interference Real-Time NoC Evaluation Conclusion Publications References

1. M ULTICAST R OUTING : HOE

Allowed with HOE

9 8 7

4 5 6

3 2 1

Even

Odd

Even

I Hamiltonian Odd Even (Bahrebar et al.) routing: allows some non-Hamiltonian paths

I Our choice: Dual-Path + HOE

(64)

Intro Parallelism Extraction Mapping/Scheduling Code Generation Communication Interference Real-Time NoC Evaluation Conclusion Publications References

E VALUATION OF D EADLOCK - FREE M ULTICAST R OUTING

Hamiltonian.

(65)

Intro Parallelism Extraction Mapping/Scheduling Code Generation Communication Interference Real-Time NoC Evaluation Conclusion Publications References

E VALUATION OF D EADLOCK - FREE M ULTICAST R OUTING

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Dual Path+Hamiltonian Dual Path+HOE Minimum rate vs. Network load

(Higher is better)

Network Load Realistic network load

I Minimal rate increased of up to 19% with Dual Path HOE compared to Dual Path

(66)

2. R OUTE S ELECTION

cluster router f1

1 2 3

f2

4 5 6

7 8 9

f1.1 f1.2

f2.1 f2.2 f2.3

f1.3

[Boyer M., Dupont de Dinechin B., Graillat A., Havet L, ERTS 2018]

I Input: set of route for each flow I Output: one route per flow I Maximize fairness

I Naive exploration: Enumeration all

combinations, run step 3 on each solution, keep the best.

f2.1 f2.2 f2.3

f1.1 0.5, 0.5 0.5, 0.5 0.5, 0.5 f1.2 0.5, 0.5 1.0, 1.0 1.0, 1.0 f1.3 1.0, 1.0 0.5, 0.5 1.0, 1.0

Enumeration of 9 combinations of possible routes

(67)

Intro Parallelism Extraction Mapping/Scheduling Code Generation Communication Interference Real-Time NoC Evaluation Conclusion Publications References

2. R OUTE S ELECTION

cluster router f1

1 2 3

f2

4 5 6

7 8 9

f1.1 f1.2

f2.1 f2.2 f2.3

f1.3

[Boyer M., Dupont de Dinechin B., Graillat A., Havet L, ERTS 2018]

I Input: set of route for each flow I Output: one route per flow I Maximize fairness

I Naive exploration: Enumeration all combinations (9 combinations)

I Exploration with pruning: Ignore combinations with non-minimal bottleneck

I Apply rate attribution (step 3)

f2.1 f2.2 f2.3

f1.1 f1.2

1.0, 1.0 1.0, 1.0

f1.3

1.0, 1.0 1.0, 1.0

Enumeration of 4 combinations

(68)

2. R OUTE S ELECTION

cluster router f1

1 2 3

f2

4 5 6

7 8 9

f1.1 f1.2

f2.1 f2.2 f2.3

f1.3

[Boyer M., Dupont de Dinechin B., Graillat A., Havet L, ERTS 2018]

I Input: set of route for each flow I Output: one route per flow I Maximize fairness

I Naive exploration: Enumeration all combinations (9 combinations)

I Exploration with pruning: Ignore combinations with non-minimal bottleneck

I Apply rate attribution (step 3)

f2.1 f2.2 f2.3

f1.1

f1.2 1.0, 1.0 1.0, 1.0

f1.3 1.0, 1.0 1.0, 1.0

Enumeration of 4 combinations

(69)

39/48

Intro Parallelism Extraction Mapping/Scheduling Code Generation Communication Interference Real-Time NoC Evaluation Conclusion Publications References

Exploration vs. Our Exploration with Pruning

I Exploration with pruning (EURS)

(70)

39/48

Exploration vs. Our Exploration with Pruning

I Exploration with pruning (EURS)

(71)

Intro Parallelism Extraction Mapping/Scheduling Code Generation Communication Interference Real-Time NoC Evaluation Conclusion Publications References

2. R OUTE S ELECTION

cluster router f1

1 2 3

f2

4 5 6

7 8 9

f1.1 f1.2

f2.1 f2.2 f2.3

f1.3

[Boyer M., Dupont de Dinechin B., Graillat A., Havet L, ERTS 2018]

I Input: set of route for each flow I Output: one route per flow I Maximize fairness

I Naive exploration: Enumeration all combinations (9 combinations)

I Our Exploration with pruning: Ignore non-minimal bottleneck (4 combinations) I Our LP-based Heuristic:

I Consider all alternative routes at once 1. Attribute fair rates

2. Remove smallest alternative routes

3. Enumerate the 3 combinations

(72)

Intro Parallelism Extraction Mapping/Scheduling Code Generation Communication Interference Real-Time NoC Evaluation Conclusion Publications References

2. R OUTE S ELECTION

cluster router f1

1 2 3

f2

4 5 6

7 8 9

f1.1 f1.2

f2.1 f2.2 f1.3

1 4

1 4

1 4

1 4 1

2

1 4

f2.3

[Boyer M., Dupont de Dinechin B., Graillat A., Havet L, ERTS 2018]

I Input: set of route for each flow I Output: one route per flow I Maximize fairness

I Naive exploration: Enumeration all combinations (9 combinations)

I Our Exploration with pruning: Ignore non-minimal bottleneck (4 combinations) I Our LP-based Heuristic:

I Consider all alternative routes at once 1. Attribute fair rates

3. Enumerate the 3 combinations

(73)

Intro Parallelism Extraction Mapping/Scheduling Code Generation Communication Interference Real-Time NoC Evaluation Conclusion Publications References

2. R OUTE S ELECTION

cluster router f1

1 2 3

f2

4 5 6

7 8 9

f1.1 f1.2

f2.1 f2.2 f1.3

1 4

1 4

1 4

1 4 1

2

1 4

f2.3

[Boyer M., Dupont de Dinechin B., Graillat A., Havet L, ERTS 2018]

I Input: set of route for each flow I Output: one route per flow I Maximize fairness

I Naive exploration: Enumeration all combinations (9 combinations)

I Our Exploration with pruning: Ignore non-minimal bottleneck (4 combinations) I Our LP-based Heuristic:

I Consider all alternative routes at once 1. Attribute fair rates

2. Remove smallest alternative routes

3. Enumerate the 3 combinations

(74)

Intro Parallelism Extraction Mapping/Scheduling Code Generation Communication Interference Real-Time NoC Evaluation Conclusion Publications References

2. R OUTE S ELECTION

cluster router f1

1 2 3

f2

4 5 6

7 8 9

f1.1 f1.2

f2.1 f2.2 f1.3

f2.3

2 3

1 3

1 3

1 3

[Boyer M., Dupont de Dinechin B., Graillat A., Havet L, ERTS 2018]

I Input: set of route for each flow I Output: one route per flow I Maximize fairness

I Naive exploration: Enumeration all combinations (9 combinations)

I Our Exploration with pruning: Ignore non-minimal bottleneck (4 combinations) I Our LP-based Heuristic:

I Consider all alternative routes at once 1. Attribute fair rates

2. Remove smallest alternative routes

(75)

Intro Parallelism Extraction Mapping/Scheduling Code Generation Communication Interference Real-Time NoC Evaluation Conclusion Publications References

2. R OUTE S ELECTION

cluster router f1

1 2 3

f2

4 5 6

7 8 9

f1.1 f1.2

f2.1 f2.2 f1.3

f2.3

2 3

1 3

1 3

1 3

[Boyer M., Dupont de Dinechin B., Graillat A., Havet L, ERTS 2018]

I Input: set of route for each flow I Output: one route per flow I Maximize fairness

I Naive exploration: Enumeration all combinations (9 combinations)

I Our Exploration with pruning: Ignore non-minimal bottleneck (4 combinations) I Our LP-based Heuristic:

I Consider all alternative routes at once 1. Attribute fair rates

2. Remove smallest alternative routes

3. Enumerate the 3 combinations

(76)

E VALUATION OF O UR LP- BASED H EURISTIC

(77)

Intro Parallelism Extraction Mapping/Scheduling Code Generation Communication Interference Real-Time NoC Evaluation Conclusion Publications References

E VALUATION OF O UR LP- BASED H EURISTIC

(78)

LP- BASED H EURISTIC VS . O PTIMAL E XPLORATION

I Optimal algorithms (100%): naive exploration, exploration with pruning

I Minimal rate as indicator of fairness

(79)

Intro Parallelism Extraction Mapping/Scheduling Code Generation Communication Interference Real-Time NoC Evaluation Conclusion Publications References

Part I: Semantics Preserving Parallelization

Part II: Real-Time Guarantees

Part III: Evaluation and Conclusion

I Use Cases and Evaluation

I Conclusion

(80)

T HE ROSACE C ASE S TUDY

I Altitude-only flight controller

I Open source (Simulink, Lustre, Giotto) [Pagetti, Soussi ´e, RTAS’14]

Best effort Sequential

pure +100 cycles +200 cycles I +100: each task augmented with 100 cycles

I +200: each task augmented with 200 cycles

I Best effort (ASAP): no real-time guarantees

I Time-Triggered (TT): real-time guarantees

(81)

Intro Parallelism Extraction Mapping/Scheduling Code Generation Communication Interference Real-Time NoC Evaluation Conclusion Publications References

T HE ROSACE C ASE S TUDY

I Altitude-only flight controller

I Open source (Simulink, Lustre, Giotto) [Pagetti, Soussi ´e, RTAS’14]

Best effort Sequential

pure +100 cycles +200 cycles

÷1.8

÷2.1

I +100: each task augmented with 100 cycles

I +200: each task augmented with 200 cycles

I Best effort (ASAP): no real-time guarantees

I Time-Triggered (TT): real-time guarantees

(82)

S YNTHETIC B ENCHMARK ON 64 CORES

I 3 phases: Dispatch, Compute, Gather

I 20 Bytes per flow, high network congestion for gather phase

0 2

8 10

4 6

12 14

1 3

9 11

5 7

13 15

Dispatch

Compute

0 2

8 10

4 6

12 14

1 3

9 11

5 7

13 15

Gather

(83)

Intro Parallelism Extraction Mapping/Scheduling Code Generation Communication Interference Real-Time NoC Evaluation Conclusion Publications References

S YNTHETIC B ENCHMARK ON 64 CORES

I 3 phases: Dispatch, Compute, Gather

I 20 Bytes per flow, high network congestion for gather phase

I 54% of WCRT for functional code

(84)

Intro Parallelism Extraction Mapping/Scheduling Code Generation Communication Interference Real-Time NoC Evaluation Conclusion Publications References

C ONCLUSION

(WCTT) N1 N2 N3

N4 N5

N6 Dependency graph Communication and N1.c N2.c N3.c

N4.c N5.c N6.c functional code

scheduling Non-preemptive

Mapping +

MIA Contribution External tool

NoC Routing + Rate Attribution

Executable for Kalray

WCET Analysis Code Generation System + Communication

Network Calculus N3

N2

N1 N4

N5 N6

pre init

Parallelism Extraction

release dates

(85)

Intro Parallelism Extraction Mapping/Scheduling Code Generation Communication Interference Real-Time NoC Evaluation Conclusion Publications References

C ONCLUSION

(WCTT) N1 N2 N3

N4 N5

N6 Dependency graph Communication and

MIA Contrib

Externa

NoC Routing + Rate Attribution

Executable for Kalray

WCET Analysis

Network Calculus N6

init

Parallelism Extraction

release dates

Generation Code Preserving

Semantics

Bridge between academic and industry

(86)

Intro Parallelism Extraction Mapping/Scheduling Code Generation Communication Interference Real-Time NoC Evaluation Conclusion Publications References

C ONCLUSION

(WCTT) N1 N2 N3

N4 N5

N6 Dependency graph Communication and Contrib

Externa

NoC Routing + Rate Attribution

Network Calculus N6

init

Parallelism Extraction

release dates

Time-triggered Execution

Model Generation

Code Preserving

Semantics

(87)

Intro Parallelism Extraction Mapping/Scheduling Code Generation Communication Interference Real-Time NoC Evaluation Conclusion Publications References

C ONCLUSION

Contrib Externa

release dates

Time-triggered Execution

Model

Hard realtime Guarantees

Many-Core Performance Generation

Code Preserving

Semantics

Bridge between academic and industry

(88)

C ONCLUSION

Contrib Externa

release dates

Time-triggered Execution

Model

Hard realtime Guarantees

Many-Core Performance Generation

Code Preserving

Semantics

Bridge between academic and industry

(89)

Intro Parallelism Extraction Mapping/Scheduling Code Generation Communication Interference Real-Time NoC Evaluation Conclusion Publications References

F UTURE W ORK

Generation Code Preserving

Semantics

Hard realtime Guarantees

Many-Core Performance Time-triggered

Execution Model

Thank you for your attention. Questions?

(90)

Intro Parallelism Extraction Mapping/Scheduling Code Generation Communication Interference Real-Time NoC Evaluation Conclusion Publications References

F UTURE W ORK

Release date computation and memory interferences

analysis with NoC

Generation Code Preserving

Semantics

Hard realtime Guarantees

Many-Core Performance Time-triggered

Execution Model

Thank you for your attention. Questions?

(91)

Intro Parallelism Extraction Mapping/Scheduling Code Generation Communication Interference Real-Time NoC Evaluation Conclusion Publications References

F UTURE W ORK

Release date computation and memory interferences

analysis with NoC Memory size problem (overlays?)

Generation Code Preserving

Semantics

Hard realtime Guarantees

Many-Core Performance Time-triggered

Execution Model

Thank you for your attention. Questions?

(92)

Intro Parallelism Extraction Mapping/Scheduling Code Generation Communication Interference Real-Time NoC Evaluation Conclusion Publications References

F UTURE W ORK

Release date computation and memory interferences

analysis with NoC Memory size problem (overlays?) Input/Output Management

Generation Code Preserving

Semantics

Hard realtime Guarantees

Many-Core Performance Time-triggered

Execution Model

Thank you for your attention. Questions?

(93)

Intro Parallelism Extraction Mapping/Scheduling Code Generation Communication Interference Real-Time NoC Evaluation Conclusion Publications References

F UTURE W ORK

Release date computation and memory interferences

analysis with NoC Memory size problem (overlays?) Input/Output Management

What should be the standard real-time multi-core processor?

Generation Code Preserving

Semantics

Hard realtime Guarantees

Many-Core Performance Time-triggered

Execution Model

Thank you for your attention. Questions?

(94)

F UTURE W ORK

Release date computation and memory interferences

analysis with NoC Memory size problem (overlays?) Input/Output Management

What should be the standard real-time multi-core processor?

Generation Code Preserving

Semantics

Hard realtime Guarantees

Many-Core Performance Time-triggered

Execution Model

Thank you for your attention. Questions?

(95)

Intro Parallelism Extraction Mapping/Scheduling Code Generation Communication Interference Real-Time NoC Evaluation Conclusion Publications References

Published

Graillat A., Dupont de Dinechin B., DATE 2018

Parallel Code Generation of Synchronous Programs for a Many-core Architecture.

Boyer M., Dupont de Dinechin B., Graillat A., Havet L, ERTS 2018

Computing Routes and Delay Bounds for the Network-on-Chip of the Kalray MPPA2 Processor.

Dupont de Dinechin B., Graillat A., NoCArc 2017

Feed-Forward Routing for the Wormhole Switching Network-on-Chip of the Kalray MPPA2-256 Processor.

Dupont de Dinechin B., Graillat A., AISTECS 2017

Network-on-Chip Service Guarantees on the Kalray MPPA-256 Bostan Processor.

Submitted

Graillat A., Rihani H., Maiza C., Moy M., Raymond P., Dupont de Dinechin B., Real-Time Systems Journal

Implementation Framework for Real-Time Data-Flow Synchronous Programs on Many-Cores.

(96)

R EFERENCES

(97)

Intro Parallelism Extraction Mapping/Scheduling Code Generation Communication Interference Real-Time NoC Evaluation Conclusion Publications References

Backup

(98)

D EADLOCK IN W ORMHOLE N ETWORKS

R1 R2

R4 R3 A

B

I This instance of flows deadlocks

I A wormhole packet is “spread” along the route

I Links 1-4 and 3-2 are shared I A holds 1-4 but waits for 3-2 I B holds 3-2 but waits for 1-4 I Deadlock-freeness can be ensured at routing time

I Solutions: XY, Hamiltonian Odd-Even, Turn Prohibition, etc

(99)

Intro Parallelism Extraction Mapping/Scheduling Code Generation Communication Interference Real-Time NoC Evaluation Conclusion Publications References

M AX M IN F AIR R ATE A TTRIBUTION

0 2 3

4 5 6 7

8 9 10 11

13 14 15

12 1

cluster router

f3 f4

f1 f2

I Rate limiter configuration to avoid buffer overflow I Rate of a flow f

i

noted ρ

i

I Valid: for each link, P

i

ρ

i

≤ 1 flit/cycle

I Fair attribution: max-min fairness [? ]: “cannot increase a rate without decreasing an already smaller (or equal) rate”

Example 1:

f1 =

12

, f2 =

12

, f3 =

14

, f4 =

14

→ Valid but not max-min fair (since increasing f3 or f4 can be done by reducing the greater f2)

Example 2:

f1 =

23

, f2 =

13

, f3 =

13

, f4 =

13

→ Valid and max-min fair (increasing f2, f3 or f4 cannot be done without reducing one of them)

Classical solution: the “Water Filling” algorithm [? ]

(100)

Intro Parallelism Extraction Mapping/Scheduling Code Generation Communication Interference Real-Time NoC Evaluation Conclusion Publications References

M AX M IN F AIR R ATE A TTRIBUTION

0 2 3

4 5 6 7

8 9 10 11

13 14 15

12 1

cluster router

f3 f4

f1 f2

I Rate limiter configuration to avoid buffer overflow I Rate of a flow f

i

noted ρ

i

I Valid: for each link, P

i

ρ

i

≤ 1 flit/cycle

I Fair attribution: max-min fairness [? ]: “cannot increase a rate without decreasing an already smaller (or equal) rate”

Example 1:

f1 =

12

, f2 =

12

, f3 =

14

, f4 =

14

be done by reducing the greater f2) Example 2:

f1 =

23

, f2 =

13

, f3 =

13

, f4 =

13

→ Valid and max-min fair (increasing f2, f3 or f4 cannot be done without reducing one of them)

Classical solution: the “Water Filling” algorithm [? ]

Références

Documents relatifs

This article presents a compilation process to automatically generate a parallel task code executable onto distributed mem- ory machines from a sequential code. This compilation

To understand the behavior of a clocked program, one has to count the number of instances of clock advance instructions since the creation of each activity.. Advance instruc- tions

In Section 3, we present dierent loop parallelization algorithms proposed in the literature, recalling the dependence representations they use, the loop transformations they

Code generation from an interface based hierarchy tar- gets embedded application by generating static C code from a scheduling/mapping of the graph for a given architecture.. One C

Quarry map showing the original location of the juvenile Diplodocus skull (CM 11255) relative to other dinosaurs excavated from Carnegie Quarry at Dinosaur National Monument.

Structure of the generated code for the testbenchs To facilitate the debug of a RVC-CAL model, a test bench is generated for each instance (i.e. actors and networks). As presented

La figure 2a représente la chambre de combustion comme une sorte de réacteur chimique avec une entrée (flèche verte représentant l'injection) et une sortie

Given a mapping (tasks to cores), we automatically generate code suitable for the targeted many-core architecture. This minimizes memory interferences and allows usage of a framework