Poster: Fault-Tolerant Multi-Processor Scheduling with Backup Copy Technique

(1)

HAL Id: hal-01610745

https://hal.inria.fr/hal-01610745

Submitted on 5 Oct 2017

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Poster: Fault-Tolerant Multi-Processor Scheduling with Backup Copy Technique

Petr Dobiáš, Emmanuel Casseau, Oliver Sinnen

To cite this version:

Petr Dobiáš, Emmanuel Casseau, Oliver Sinnen. Poster: Fault-Tolerant Multi-Processor Schedul- ing with Backup Copy Technique. Conference on Design and Architectures for Signal and Image Processing (DASIP), Sep 2017, Dresden, Germany. �hal-01610745�

(2)

Fault-Tolerant Multi-Processor Scheduling with Backup Copy Technique

Petr Dobiáš, Emmanuel Casseau and Oliver Sinnen

emmanuel.casseau@irisa.fr

Aim : Development of a run-time, self-adaptive scheduling algorithm for Multi-Processor System-on-Chip taking into account faults and thereby treating unreliable system components

Keywords : Fault-tolerant design, Mapping and scheduling, Multi-processor platform

Context :

^I

Increasing demand on multi-processor systems for high perfor- mance and low power consumption

I

Rising susceptibility of system failure because of transistor scaling and diminution of operating voltage

Issues :

^I

Trade-off between parallel computing (performance) and spatial redundancy on multi-cores (reliability)

I

Mapping and scheduling control on multiprocessor platforms

Assumptions :

I P

identical processors

I

Aperiodic and independent tasks

I

Dynamic mapping and scheduling without preemption

I

Only one processor failure at any instant of time

I

Primary/backup (PB) approach: each task

T

has two iden- tical copies: primary copy (PC) and backup copy (BC)

Principle of Mapping and Scheduling Based on the Primary/Backup Approach

Mapping and scheduling of a new task T3

Algorithm to map and schedule tasks Fault management algorithm

Fault management after execution of PC_1 Mapping and scheduling of a new task T2

time (AU) P1

P2

P3

PC_1 T2

BC_1 PC_2

BC_2 time = 2

0 5 10 15

time (AU) P1

P2

P3

PC_1 T3

BC_1 PC_2

BC_2

PC_3

BC_3 time = 3

0 5 10 15

time (AU) P1

P2

P3

PC_1

PC_2

BC_2

PC_3

BC_3 time = 5

0 5 10 15

PC_1 detected as correct: deallocate BC_1

Faulty execution

of PC?

Wait for results of BC

Deallocate BC NO

YES After execution

of PC, the fault detection me- chanism reports Map and

schedule PC (as soon as possible)

Map and schedule BC

(as late as possible)

PC and BC schedules

found?

Reject the task YES

NO

Commit the task

Figure:System model

Parameter Value

Number of processors 2-25 Targeted processor load (

TPL

) 0.25-2.25

Table:Simulation parameters

Task attribute Distribution Min Max

Arrival time

a

Poisson - -

Computation time

c

Uniform 1 20 Deadline

d

Uniform 2

c

5

c

Table:Task parameters

Results

2 4 6 8 10 12 14 16 18 20 22 24 Number of processors

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

Rejection rate

Task rejection rate (TPL=1)

0.2 0.3 0.4 0.5 0.6

Processor load

Processor load (TPL=1)

45 50 55 60 65 70

Mean time to next fault (time units)

Time to next fault in case of permanent fault (TPL=1)