HAL Id: hal-01610745
https://hal.inria.fr/hal-01610745
Submitted on 5 Oct 2017
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
Poster: Fault-Tolerant Multi-Processor Scheduling with Backup Copy Technique
Petr Dobiáš, Emmanuel Casseau, Oliver Sinnen
To cite this version:
Petr Dobiáš, Emmanuel Casseau, Oliver Sinnen. Poster: Fault-Tolerant Multi-Processor Schedul- ing with Backup Copy Technique. Conference on Design and Architectures for Signal and Image Processing (DASIP), Sep 2017, Dresden, Germany. �hal-01610745�
Fault-Tolerant Multi-Processor Scheduling with Backup Copy Technique
Petr Dobiáš, Emmanuel Casseau and Oliver Sinnen
emmanuel.casseau@irisa.fr
Aim : Development of a run-time, self-adaptive scheduling algorithm for Multi-Processor System-on-Chip taking into account faults and thereby treating unreliable system components
Keywords : Fault-tolerant design, Mapping and scheduling, Multi-processor platform
Context :
IIncreasing demand on multi-processor systems for high perfor- mance and low power consumption
I
Rising susceptibility of system failure because of transistor scaling and diminution of operating voltage
Issues :
ITrade-off between parallel computing (performance) and spatial redundancy on multi-cores (reliability)
I
Mapping and scheduling control on multiprocessor platforms
Assumptions :
I P
identical processors
I
Aperiodic and independent tasks
I
Dynamic mapping and scheduling without preemption
I
Only one processor failure at any instant of time
I
Primary/backup (PB) approach: each task
Thas two iden- tical copies: primary copy (PC) and backup copy (BC)
Principle of Mapping and Scheduling Based on the Primary/Backup Approach
Mapping and scheduling of a new task T3
Algorithm to map and schedule tasks Fault management algorithm
Fault management after execution of PC_1 Mapping and scheduling of a new task T2
time (AU) P1
P2
P3
PC_1 T2
BC_1 PC_2
BC_2 time = 2
0 5 10 15
0 5 10 15
0 5 10 15
time (AU) P1
P2
P3
PC_1 T3
BC_1 PC_2
BC_2
PC_3
BC_3 time = 3
0 5 10 15
0 5 10 15
0 5 10 15
0 5 10 15
time (AU) P1
P2
P3
PC_1
PC_2
BC_2
PC_3
BC_3 time = 5
0 5 10 15
0 5 10 15
PC_1 detected as correct: deallocate BC_1
Faulty execution
of PC?
Wait for results of BC
Deallocate BC NO
YES After execution
of PC, the fault detection me- chanism reports Map and
schedule PC (as soon as possible)
Map and schedule BC
(as late as possible)
PC and BC schedules
found?
Reject the task YES
NO
Commit the task
Figure:System model
Parameter Value
Number of processors 2-25 Targeted processor load (
TPL) 0.25-2.25
Table:Simulation parameters
Task attribute Distribution Min Max
Arrival time
aPoisson - -
Computation time
cUniform 1 20 Deadline
dUniform 2
c5
cTable:Task parameters
Results
2 4 6 8 10 12 14 16 18 20 22 24 Number of processors
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
Rejection rate
Task rejection rate (TPL=1)
2 4 6 8 10 12 14 16 18 20 22 24 Number of processors
0.2 0.3 0.4 0.5 0.6
Processor load
Processor load (TPL=1)
2 4 6 8 10 12 14 16 18 20 22 24 Number of processors
45 50 55 60 65 70
Mean time to next fault (time units)
Time to next fault in case of permanent fault (TPL=1)