Response Time Analysis of Dataow Applications on a Many-Core Processor
with Shared-Memory and Network-on-Chip
Matthieu Moy
LIP (Univ. Lyon 1)
November 2018
CASH: Topics - People
Optimized (software/hardware) compilation for HPC software with data-intensive computations.
Means: dataflow IR, static analyses, optimisations, simulation.
Sequential Program
Parallel Program HPC Applications
Parallelism
Extraction Intermediate Parallel Representation
Code Generation
Hardware (FPGA)
Software (CPU & accelerators) Optimization
Dataflow Semantics
Analysis Abstract
Interpretation
Simulation Polyhedral
Model
Christophe Alias, Laure Gonnord, Ludovic Henrio, Matthieu
Moy
http://www.ens-lyon.fr/LIP/CASH/Outline
1 Critical, Real-Time and Many-Core 2 Parallel code generation and analysis 3 Models Denition
4 Interferences and NoC Communications 5 Evaluation
6 Conclusion and Future Work
Outline
1 Critical, Real-Time and Many-Core 2 Parallel code generation and analysis 3 Models Denition
4 Interferences and NoC Communications 5 Evaluation
6 Conclusion and Future Work
3/28
Time-critical, compute intensive applications
◦ Hard Real-Time: we mustguarantee that task execution completes before deadline
◦ Compute-intensive
Performance Vs Predictability
Predictable Fast
68000 PowerPC
i7 GPU
Many Core
5/28
Many-core
Lots of simple = cores
Kalray MPPA (Massively Parallel Processor Array):
◦ 256 cores
◦ No cache consistency
◦ No out-of-order execution
◦ No branch prediction
◦ No timing anomaly
◦ Predictable NoC
⇒ good t for real-time?
Many-core
Lots of simple = cores
Kalray MPPA (Massively Parallel Processor Array):
◦ 256 cores
◦ No cache consistency
◦ No out-of-order execution
◦ No branch prediction
◦ No timing anomaly
◦ Predictable NoC
⇒ good t for real-time?
6/28
Kalray's business model
Hard Real-Time on Many-Core
High-level Data-Flow Application Model
Synchronous hypothesis:
computation/communication in 0-time
Network On Chip
Communication takes time
Shared Memory within Cluster
Interferences between tasks
Individual Cores
Cache, Pipeline, . . .
I1 I2
T1 T2
T3
O1 O2
Take into account all levels
in Worst-Case Execution Time (WCET) analysis and programming model
8/28
Hard Real-Time on Many-Core
High-level Data-Flow Application Model
Synchronous hypothesis:
computation/communication in 0-time
Network On Chip
Communication takes time
Shared Memory within Cluster
Interferences between tasks
Individual Cores
Cache, Pipeline, . . .
I1 I2
T1 T2
T3
O1 O2
Take into account all levels
in Worst-Case Execution Time (WCET) analysis and programming model
Hard Real-Time on Many-Core
High-level Data-Flow Application Model
Synchronous hypothesis:
computation/communication in 0-time
Network On Chip
Communication takes time
Shared Memory within Cluster
Interferences between tasks
Individual Cores
Cache, Pipeline, . . .
I1 I2
T1 T2
T3
O1 O2
Take into account all levels
in Worst-Case Execution Time (WCET) analysis and programming model
8/28
Hard Real-Time on Many-Core
High-level Data-Flow Application Model
Synchronous hypothesis:
computation/communication in 0-time
Network On Chip
Communication takes time
Shared Memory within Cluster
Interferences between tasks
Individual Cores
Cache, Pipeline, . . . I1
I2
T1 T2
T3
O1 O2
Take into account all levels
in Worst-Case Execution Time (WCET) analysis and programming model
Hard Real-Time on Many-Core
High-level Data-Flow Application Model
Synchronous hypothesis:
computation/communication in 0-time
Network On Chip
Communication takes time
Shared Memory within Cluster
Interferences between tasks
Individual Cores
Cache, Pipeline, . . . I1
I2
T1 T2
T3
O1 O2
Take into account all levels
in Worst-Case Execution Time (WCET) analysis and programming model
8/28
Outline
1 Critical, Real-Time and Many-Core
2 Parallel code generation and analysis 3 Models Denition
4 Interferences and NoC Communications 5 Evaluation
6 Conclusion and Future Work
Execution of Synchronous Data Flow Programs
τ0
NA τ1
NB
τ2
NC
τ3
ND
τ4
NE τ5
NF i0
i1
o
High level representation
3Respect the dependency constraints
3Set the release dates to get precise upper bounds
on the interference
code generation Single-core
static non-preemptive scheduling
Industrialized as SCADE (1993)
heavily used in avionics and nuclear
int main_app (i1,i2)
{ na = NA(i1);
ne = NE(i2);
nb = NB(na);
nd = ND(na);
nf = NF(ne);
o = NC(nb ,nd ,nf);
return o;
}
10/28
Execution of Synchronous Data Flow Programs
τ0
NA τ1
NB
τ2
NC
τ3
ND
τ4
NE τ5
NF i0
i1
o
High level representation
3Respect the dependency constraints
3Set the release dates to get precise upper bounds
on the interference
code generation Multi/Many-core
static non-preemptive scheduling
intNF (...) {// taskτ6
return(...) ; }
intNE (...) {// taskτ5
return(...) ; }
intND (...) {// taskτ4
return(...) ; }
intNC (...) {// taskτ3
return(...) ; }
intNB (...) {// taskτ2
return(...) ; }
intNA (...) {// taskτ1
return(...) ; }
PE2
PE1
PE0 wcrt0
τ0
wcrt1 τ1
wcrt2 τ2
wcrt3 τ3
wcrt4 τ4
wcrt5
τ5
Parallel code generation from Lustre/SCADE (pseudo-code)
τ0
NA τ1
NB
τ2
NC
τ3
ND
τ4
NE τ5
NF i0
i1
o
// Generated by SCADE KCG void NA( ctx_a * ctx ) {
// ... computation ...
}
void NA_wrapper ( ctx_a * ctx ) { RECV_NA (i0);
NA( ctx );
SEND_NA_NB (...) ;
}
PE2
PE1
PE0 wcrt0
τ0
wcrt1 τ1
wcrt2 τ2
wcrt3 τ3
wcrt4 τ4
wcrt5
τ5
// Generated by us void worker_PE0 (void) {
ctx_a ctxa ; ctx_b ctxb ; while (1) {
NA_wrapper (& ctxa );
wait ( release_t2 );
NB_wrapper (& ctxb );
wait ( end_of_period );
} }
# define RECV_NA ( data ) ...
11/28
Contribution
◦ Previous work:
◦ Predictable execution model within each cluster
◦ Mathematical model of arbitration for memory accesses
◦ Algorithm to compute a time-triggered schedule (x-point resolution)
◦ This talk:
◦ Multi-cluster application
◦ Time-triggered schedule taking Network on Chip (NoC) accesses into account
Outline
1 Critical, Real-Time and Many-Core 2 Parallel code generation and analysis
3 Models Denition
4 Interferences and NoC Communications 5 Evaluation
6 Conclusion and Future Work
13/28
Architecture Model
I/OEthernet0 I/OEthernet1 I/O DDR 0
I/O DDR 1
◦ Kalray MPPA256Bostan
◦ 16 compute clusters + 4 I/O clusters
◦ Dual NoC (Network on Chip)
Rx
Tx DSU
RM
P15
P0
RR 3→1
16RR→1 2RR→1
FP shared memory
bank highpriority
G3
G2
G1
Architecture Model
I/OEthernet0 I/OEthernet1 I/O DDR 0
I/O DDR 1
P0 P1 P2 P3
P4 P5
P6 P7
RM NoC Rx
P8 P9 P10 P11
P12 P13
P14 P15
DSU NoC Tx
8sharedmemorybanks 8sharedmemorybanks
Per cluster:
◦ 16 cores + 1 Resource Manager
◦ NoC Tx, NoC Rx, Debug Unit
◦ 16shared memory banks (total size:2MB)
◦ Multi-level bus arbiter per memory bank
Rx
Tx DSU
RM
P15
P0
3RR→1
16RR→1 2RR→1
FP shared memory
bank highpriority
G3
G2
G1
14/28
Architecture Model
I/OEthernet0 I/OEthernet1 I/O DDR 0
I/O DDR 1
P0 P1 P2 P3
P4 P5
P6 P7
RM NoC Rx
P8 P9 P10 P11
P12 P13
P14 P15
DSU NoC Tx
8sharedmemorybanks 8sharedmemorybanks
Per cluster:
◦ 16 cores + 1 Resource Manager
◦ NoC Tx, NoC Rx, Debug Unit
◦ 16shared memory banks (total size:2MB)
◦ Multi-level bus arbiter per memory bank
Rx
Tx DSU
RM
P15
P
3RR→1
16RR→1 2RR→1
FP shared memory
bank highpriority
G3
G2
G1
Execution Model: Within a Cluster
τ0
NA τ1
NB
τ2
NC
τ3
ND
τ4
NE τ5
NF i0
i1
o
P0 P1 P2 P3 P4 P5 P6 P7 RM NoC Rx
P8 P9 P10 P11 P12 P13 P14 P15 DSU NoC Tx
8sharedmemorybanks 8sharedmemorybanks
P0 P1 P2
arbiter arbiter arbiter
b0 b1 b2
memory bank (128KB)
◦ Tasks mapping on cores
◦ Static non-preemptive scheduling
◦ Spatial Isolation
dierent tasks go to dierent memory banks
◦ Interference from communications
◦ Execution model:
◦ execute in a local bank
◦ write to a remote bank
15/28
Execution Model: Within a Cluster
τ0
NA τ1
NB
τ2
NC
τ3
ND
τ4
NE τ5
NF i0
i1
o
P0 P1 P2 P3 P4 P5 P6 P7 RM NoC Rx
P8 P9 P10 P11 P12 P13 P14 P15 DSU NoC Tx
8sharedmemorybanks 8sharedmemorybanks
P0 P1 P2
arbiter arbiter arbiter
b0 b1 b2
memory bank (128KB)
◦ Tasks mapping on cores
◦ Static non-preemptive scheduling
◦ Spatial Isolation
dierent tasks go to dierent memory banks
◦ Interference from communications
◦ Execution model:
◦ execute in a local bank
◦ write to a remote bank
Execution Model: Within a Cluster
τ0
NA τ1
NB
τ2
NC
τ3
ND
τ4
NE τ5
NF i0
i1
o
P0 P1 P2 P3 P4 P5 P6 P7 RM NoC Rx
P8 P9 P10 P11 P12 P13 P14 P15 DSU NoC Tx
8sharedmemorybanks 8sharedmemorybanks
P0 P1 P2
arbiter arbiter arbiter
b0 b1 b2
memory bank (128KB)
◦ Tasks mapping on cores
◦ Static non-preemptive scheduling
◦ Spatial Isolation
dierent tasks go to dierent memory banks
◦ Interference from communications
◦ Execution model:
◦ execute in a local bank
◦ write to a remote bank
15/28
Execution Model: Within a Cluster
τ0
NA τ1
NB
τ2
NC
τ3
ND
τ4
NE τ5
NF i0
i1
o
P0 P1 P2 P3 P4 P5 P6 P7 RM NoC Rx
P8 P9 P10 P11 P12 P13 P14 P15 DSU NoC Tx
8sharedmemorybanks 8sharedmemorybanks
P0 P1 P2
arbiter arbiter arbiter
b0 b1 b2
memory bank (128KB)
◦ Tasks mapping on cores
◦ Static non-preemptive scheduling
◦ Spatial Isolation
dierent tasks go to dierent memory banks
◦ Interference from communications
◦ Execution model:
◦ execute in a local bank
◦ write to a remote bank
NoC Communications
b0
b1
P0
P1
RX TX
b0
b1
P0
P1
RX TX
NoC
1 2 3
4
5
Steps:
1 Read from memory
2 Write to TX's buer
3 Start NoC transfer
4 Data transmission through the NoC
5 Write to memory
Interference:
1 Same as other reads
2, 3 One TX channel per sender
⇒ independent accesses.
4 Interferences in each router
→ network calculus
5 High-priority interference⇒ B
16/28
NoC Communications
b0
b1
P0
P1
RX TX
b0
b1
P0
P1
RX TX
NoC 1
2 3
4
5
Steps:
1 Read from memory
2 Write to TX's buer
3 Start NoC transfer
4 Data transmission through the NoC Write to memory
Interference:
1 Same as other reads
2, 3 One TX channel per sender
⇒ independent accesses.
4 Interferences in each router
→ network calculus
5 High-priority interference⇒ B
NoC Communications
b0
b1
P0
P1
RX TX
b0
b1
P0
P1
RX TX
NoC 1
2 3
4
5
Steps:
1 Read from memory
2 Write to TX's buer
3 Start NoC transfer
4 Data transmission through the NoC
5 Write to memory
Interference:
1 Same as other reads
2, 3 One TX channel per sender
⇒ independent accesses.
4 Interferences in each router
→ network calculus
5 High-priority interference⇒ B
16/28
Application Model and Interferences
τ0
NA τ1
NB
τ2
NC
τ3
ND
τ4
NE τ5
NF i0
i1
o ◦ Directed Acyclic Task Graph
◦ Mono-rate
◦ Fixed mapping and execution order
◦ For each task τi: WCRTi= WCETi +X
j6=i
interferencei,j
t
00 40 80 120 160
Processor Demand Memory/TX access time reli
Interference
E E
0Find R
i(including the interference)
Find rel
irespecting precedence constraints
Previous work:
Application Model and Interferences
τ0
NA τ1
NB
τ2
NC
τ3
ND
τ4
NE τ5
NF i0
i1
o ◦ Directed Acyclic Task Graph
◦ Mono-rate
◦ Fixed mapping and execution order
◦ For each task τi: WCRTi= WCETi +X
j6=i
interferencei,j
t
00 40 80 120 160
Processor Demand Memory/TX access time reli
Interference
E E
0Find R
i(including the interference)
Find rel
irespecting precedence constraints
Previous work:
17/28
Application Model and Interferences
τ0
NA τ1
NB
τ2
NC
τ3
ND
τ4
NE τ5
NF i0
i1
o ◦ Directed Acyclic Task Graph
◦ Mono-rate
◦ Fixed mapping and execution order
◦ For each task τi: WCRTi= WCETi +X
j6=i
interferencei,j
t
00 40 80 120 160
Processor Demand Memory/TX access time
reli
Interference
E E
0Find R
i(including the interference)
Find rel
irespecting precedence constraints
Previous work:
Application Model and Interferences
τ0
NA τ1
NB
τ2
NC
τ3
ND
τ4
NE τ5
NF i0
i1
o ◦ Directed Acyclic Task Graph
◦ Mono-rate
◦ Fixed mapping and execution order
◦ For each task τi: WCRTi= WCETi +X
j6=i
interferencei,j
t
00 40 80 120 160
Processor Demand Memory/TX access time
reli
Ri
Isolation
Interference
E E
0Find R
i(including the interference)
Find rel
irespecting precedence constraints
Previous work:
17/28
Application Model and Interferences
τ0
NA τ1
NB
τ2
NC
τ3
ND
τ4
NE τ5
NF i0
i1
o ◦ Directed Acyclic Task Graph
◦ Mono-rate
◦ Fixed mapping and execution order
◦ For each task τi: WCRTi= WCETi +X
j6=i
interferencei,j
t
00 40 80 120 160
Processor Demand Memory/TX access time
reli
Ri
Interference
E E
0Find R
i(including the interference)
Find rel
irespecting precedence constraints
Previous work:
Application Model and Interferences
τ0
NA τ1
NB
τ2
NC
τ3
ND
τ4
NE τ5
NF i0
i1
o ◦ Directed Acyclic Task Graph
◦ Mono-rate
◦ Fixed mapping and execution order
◦ For each task τi: WCRTi= WCETi +X
j6=i
interferencei,j
t
00 40 80 120 160
Processor Demand Memory/TX access time
reli
Ri
Interference
E E
0Find R
i(including the interference)
Find rel
irespecting precedence constraints
Previous work:
17/28
Outline
1 Critical, Real-Time and Many-Core 2 Parallel code generation and analysis 3 Models Denition
4 Interferences and NoC Communications 5 Evaluation
6 Conclusion and Future Work
Reminder: NoC Communications
b0
b1
P0
P1
RX TX
b0
b1
P0
P1
RX TX
NoC 1
2 3
4
5
Interference:
1 Same as other reads
2, 3 One TX channel per sender
⇒independent accesses.
4 Interferences in each router
→network calculus
5 High-priority interference ⇒B
Issue:
Predict the possible execution time of 5 as precisely as possible.
19/28
Reminder: NoC Communications
b0
b1
P0
P1
RX TX
b0
b1
P0
P1
RX TX
NoC 1
2 3
4
5
Interference:
1 Same as other reads
2, 3 One TX channel per sender
⇒independent accesses.
4 Interferences in each router network calculus
Issue:
Predict the possible execution time of 5 as precisely as possible.
Example Tasks with NoC Transmission
Issue 1: overapproximation of RX execution interval
P0
RX
P0
RX Cluster 0
Cluster 1
Task 1
NoC reception
Task 2
◦ Issue:
◦ NoC reception starts after BCETa(computation before sending) + BCET(transmission of rst it)
◦ We don't have the BCET
⇒large overapproximation for RX tasks
◦ Solution: Split sending task
◦ Compute: no NoC access
◦ Copy to TX (Cpy): write to the TX's buer
◦ Start NoC transfer (EOT): write to TX's control register
aBest Case Execution Time
20/28
Example Tasks with NoC Transmission
Issue 1: overapproximation of RX execution interval
P0
RX
P0
RX Cluster 0
Cluster 1
Task 1
NoC reception Task 2
◦ Issue:
◦ NoC reception starts after BCETa(computation before sending) + BCET(transmission of rst it)
◦ We don't have the BCET
⇒large overapproximation for RX tasks
◦ Solution: Split sending task
◦ Compute: no NoC access
◦ Copy to TX (Cpy): write to the TX's buer
◦ Start NoC transfer (EOT): write to TX's control register
Example Tasks with NoC Transmission
Issue 1: overapproximation of RX execution interval
P0
RX
P0
RX Cluster 0
Cluster 1
Task 1
NoC reception (worst-case) Task 2
◦ Issue:
◦ NoC reception starts after BCETa(computation before sending) + BCET(transmission of rst it)
◦ We don't have the BCET
⇒large overapproximation for RX tasks
◦ Solution: Split sending task
◦ Compute: no NoC access
◦ Copy to TX (Cpy): write to the TX's buer
◦ Start NoC transfer (EOT): write to TX's control register
aBest Case Execution Time
20/28
Example Tasks with NoC Transmission
Issue 1: overapproximation of RX execution interval
P0
RX
P0
RX Cluster 0
Cluster 1
Task 1 Cpy EOT
NoC reception Task 2
◦ Issue:
◦ NoC reception starts after BCETa(computation before sending) + BCET(transmission of rst it)
◦ We don't have the BCET
⇒large overapproximation for RX tasks
◦ Solution: Split sending task
◦ Compute: no NoC access
◦ Copy to TX (Cpy): write to the TX's buer
◦ Start NoC transfer (EOT): write to TX's control register
Example Tasks with NoC Transmission
Issue 2: circular dependency
P0
RX
P0
RX Cluster 0
Cluster 1
Task 1 Cpy
EOT1
NoC reception (RX1) Task 2 Cpy EOT2
NoC reception (RX2)
◦ Issue:
◦ WCRT(RX1) = WCRT(EOT1) + WCTT(1→0)
◦ WCRT(EOT1) depends on WCRT(RX2), which depends on WCRT(EOT2) which depends on WCRT(RX1)
◦ ⇒x-point
◦ Solution: get rid of interference on EOT
◦ EOT = only one control register access
◦ Preload code to avoid instruction cache miss
21/28
Example Tasks with NoC Transmission
Issue 2: circular dependency
P0
RX
P0
RX Cluster 0
Cluster 1
Task 1 Cpy
EOT1
NoC reception (RX1) Task 2 Cpy EOT2
NoC reception (RX2)
◦ Issue:
◦ WCRT(RX1) = WCRT(EOT1) + WCTT(1→0)
◦ WCRT(EOT1) depends on WCRT(RX2), which depends on WCRT(EOT2) which depends on WCRT(RX1)
◦ ⇒x-point
◦ Solution: get rid of interference on EOT
◦ EOT = only one control register access
◦ Preload code to avoid instruction cache miss
3-phase Tasks Analysis
◦ Compute:
◦ Fits in previous work model
◦ Copy to TX:
◦ Force non-interfering schedule (add articial dependencies if needed)
◦ Start NoC transfer (EOT):
◦ No interference
◦ On the RX side:
◦ RX can only start after Start NoC transfer has started
⇒edge from Copy to TX to RX in the task dependency graph.
22/28
Outline
1 Critical, Real-Time and Many-Core 2 Parallel code generation and analysis 3 Models Denition
4 Interferences and NoC Communications 5 Evaluation
6 Conclusion and Future Work
Example Application: Naive Schedule
P1
P0
RX P1
P0
RX
Cluster 0 Cluster 1
RX1
RX2
n3 n4
n2
n5
n8
n1
n6 n7
n9
n10
Cpy1
Cpy2
EOT1EOT2
24/28
Example Application: Improved Schedule
P1
P0
RX P1
P0
RX
Cluster 0 Cluster 1
RX1
RX2
n3 n4
n2
n5
n8
n1
n6 n7
n9
n10
Cpy1
Cpy2
EOT1EOT2
Recap
Task WCRT Release date
Naive Improved Naive Improved Application 680 650
RX1 230 120 0 220
RX2 580 350 0 200
n3 180 160 0 0
n4 120 120 180 160
n2 100 100 10 0
n5 100 100 580 550
n8 100 100 330 340
n1 110 110 0 0
n6 100 100 0 0
n7 100 100 100 100
n9 100 100 220 220
n10 100 100 240 240
Cpy1 110 110 110 110
Cpy2 100 100 100 100
EOT1 20 20 220 220
EOT2 20 20 200 200
Naive:
P1
P0
RX P1
P0
RX
Cluster 0 Cluster 1
RX1
RX2
n3 n4
n2
n5
n8
n1
n6 n7
n9
n10
Cpy1
Cpy2 EOT1EOT2
Improved:
P1
P0
RX P1
P0
RX
Cluster 0 Cluster 1
RX1
RX2
n3 n4
n2
n5
n8
n1
n6 n7
n9
n10
Cpy1
Cpy2 EOT1EOT2
26/28
Outline
1 Critical, Real-Time and Many-Core 2 Parallel code generation and analysis 3 Models Denition
4 Interferences and NoC Communications 5 Evaluation
6 Conclusion and Future Work
Conclusion and Future Work
◦ Code generation and real-time analysis for many-core (Kalray MPPA256)
= major challenge for industry and research
◦ Hard Real-Time⇒simplicity, predictability ⇒static, time-driven schedule
◦ Critical⇒ traceability ⇒no aggressive optimization
◦ Our work:
◦ Understand and model the precise architecture of MPPA
◦ Extension of Multi-Core Response Time Analysis framework
◦ Integration analysis ↔code generation
◦ Future Work:
◦ Model Kalray MPPA3 chip (new NoC, new arbiters)
◦ Improve the static scheduling algorithm: O(n4)currently, we can do better.
◦ Integration with RTOS?
28/28
BACKUP
Multicore Response Time Analysis
Example: Fixed Priority bus arbiter, PE1 > PE0 Bus access delay = 10
t PE0
PE1
00 40 80 120 160
2accesses 2accesses
T1
2accesses T1
2accesses
T1 T1
3accesses
T0
3accesses Response time
1Altmeyer et al., RTNS 2015
Multicore Response Time Analysis
Example: Fixed Priority bus arbiter, PE1 > PE0 Bus access delay = 10
t PE0
PE1
00 40 80 120 160
2accesses 2accesses
T1
2accesses T1
2accesses
T1 T1
3accesses
T0
3accesses Response time
◦ Task of interest running onPE0:
R0=10 + 3×10(response time in isolation)
R1=10 + 3×10+2×10= 60
R2=10 + 3×10+2×10+2×10= 80
R3=10 + 3×10+2×10+2×10+ 0 = 80 (xed-point)
Multicore Response Time Analysis
Example: Fixed Priority bus arbiter, PE1 > PE0 Bus access delay = 10
t PE0
PE1
00 40 80 120 160
2accesses 2accesses
T1
2accesses T1
2accesses
T1 T1
3accesses
T0
3accesses Response time
◦ Task of interest running onPE0:
R0=10 + 3×10(response time in isolation) R1=10 + 3×10+2×10= 60
R2=10 + 3×10+2×10+2×10= 80
R3=10 + 3×10+2×10+2×10+ 0 = 80 (xed-point)
1Altmeyer et al., RTNS 2015
Multicore Response Time Analysis
Example: Fixed Priority bus arbiter, PE1 > PE0 Bus access delay = 10
t PE0
PE1
00 40 80 120 160
2accesses 2accesses
T1
2accesses T1
2accesses
T1 T1
3accesses
T0
3accesses Response time
◦ Task of interest running onPE0:
R0=10 + 3×10(response time in isolation) R1=10 + 3×10+2×10= 60
R2=10 + 3×10+2×10+2×10= 80
R3=10 + 3×10+2×10+2×10+ 0 = 80 (xed-point)
Multicore Response Time Analysis
Example: Fixed Priority bus arbiter, PE1 > PE0 Bus access delay = 10
t PE0
PE1
00 40 80 120 160
2accesses 2accesses
T1
2accesses T1
2accesses
T1 T1
3accesses
T0
3accesses Response time
◦ Task of interest running onPE0:
R0=10 + 3×10(response time in isolation) R1=10 + 3×10+2×10= 60
R2=10 + 3×10+2×10+2×10= 80
R3=10 + 3×10+2×10+2×10+ 0 = 80 (xed-point)
1Altmeyer et al., RTNS 2015
The Global Picture
Static Mapping/Scheduling
WCRT with Interferences Local WCRT Analysis Timing models (static analysis)
Probabilistic Models High-level
Program
+
Binary Generation
Code Generation
Dependencies Tasks
Mapping Execution
Order Release
Dates
+ Tasks WCRT
WC Access