Max-stretch minimization on an edge-cloud platform

(1)

HAL Id: hal-02972296

https://hal.inria.fr/hal-02972296

Submitted on 20 Oct 2020

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Anne Benoit, Redouane Elghazi, Yves Robert

To cite this version:

Anne Benoit, Redouane Elghazi, Yves Robert. Max-stretch minimization on an edge-cloud platform.

[Research Report] RR-9369, Inria - Research Centre Grenoble – Rhône-Alpes. 2020, pp.37.

�hal-02972296�

(2)

0249-6399 ISRN INRIA/RR--9369--FR+ENG

RESEARCH

REPORT

N° 9369

October 2020

Max-stretch

minimization on an

edge-cloud platform

(3)

(4)

RESEARCH CENTRE GRENOBLE – RHÔNE-ALPES

Inovallée

655 avenue de l’Europe Montbonnot

platform

Anne Benoit, Redouane Elghazi, Yves Robert

Project-Team ROMA

Research Report n° 9369 — October 2020 — 37 pages

Abstract: We consider the problem of scheduling independent jobs that are generated by pro-cessing units at the edge of the network. These jobs can either be executed locally, or sent to a centralized cloud platform that can execute them at greater speed. Such edge-generated jobs may come from various applications, such as e-health, disaster recovery, autonomous vehicles or flying drones. The problem is to decide where and when to schedule each job, with the objective to minimize the maximum stretch incurred by any job. The stretch of a job is the ratio of the time spent by that job in the system, divided by the minimum time it could have taken if the job was alone in the system. We formalize the problem and explain the differences with other models that can be found in the literature. We prove that minimizing the max-stretch is NP-complete, even in the simpler instance with no release dates (all jobs are known in advance). This result comes from the proof that minimizing the max-stretch with homogeneous processors and without release dates is NP-complete, a complexity problem that was left open before this work. We design several algorithms to propose efficient solutions to the general problem, and we conduct simulations based on real platform parameters to evaluate the performance of these algorithms.

(5)

plate-forme edge-cloud

Résumé : Nous considérons l’ordonnancement de jobs indépendants qui sont générés par des processeurs edge. Ces jobs peuvent soit être exécutés localement, soit envoyés à une plate-forme cloud plus puissante, mais il faudra alors prendre en compte les communications montantes et descendantes. Ce type de plate-forme à deux niveaux edge-cloud a de nombreuses applications. Le problème est de décider où et quand exécuter chaque job, avec pour objectif de minimiser le stretch maximal. Le stretch d’un job est le rapport du temps passé par le job dans le système sur le temps qu’il aurait passé s’il avait été seul dans ce système, et mesure donc un facteur de ralentissement dans le temps de réponse. Nous formalisons le problème en expliquant les différences avec des modèles classiques. Nous prouvons des résultats de NP-complétude sur la complexité du problème offline, à partir d’un nouveau résultats sur un problème classique et ouvert, celui de la minimisation du stretch avec des processeurs homogènes et sans dates d’arrivées (release dates) des jobs. Nous proposons plusieurs heuristiques pour le problème général online, que nous comparons à l’aide de simulations basées sur des paramètres de plates-formes existantes.

(6)

1 Introduction

Edge-Cloud computing is a recent paradigm that is currently deployed by many vendors (see [19, 20, 21, 27] and many others). The idea is to execute some jobs in-situ, directly on the edge server where they originate from, thereby providing a flexible, cost-effective and decentralized solution that dramatically reduces the volume of data transfers. But some jobs may be too demanding in terms of computing power, or some edge servers may be overloaded because they launch too many jobs. This calls for coupling the edge server with a powerful cloud platform, where some jobs can be delegated whenever needed.

Deciding which jobs should be executed locally and which ones should be communicated (up and down) to the cloud platform is a challenging problem. The main focus of this paper is to lay the foundations for solving the instance of the problem dealing with response time as a single optimization criterion. Response time is a very important criterion, in particular for the users, and arguably the most important one in a real-time framework. But it is not the unique criterion to optimize on an edge-cloud platform: energy consumption and resource costs are important criteria too. However, we show that dealing with response-time is already an intricate problem, and we leave multi-objective optimization for further work.

The response time for a job [12, 28] (also called flow time in the scheduling literature [8]) is the time spent by that job in the system, starting from its release date and up to final completion (including output transfers if any). The standard scheduling objective is to minimize the maximum response time over all jobs1_{. However, maximum response time is not appropriate to ensure fairness,}

because giving the same response time for all jobs results in worst performance for short jobs, compared to the performance achieved for long ones. In order to provide a fair processing of jobs, job lengths should be taken into account. The stretch [11, 3] is the metric used to ensure fairness among jobs, and it is defined as the response time normalized by the job length. More precisely, the stretch is the response time divided by the best possible execution time for the job if it were alone on the platform. Therefore, a job stretch measures how the performance of a job is degraded compared to a system dedicated exclusively to this job. The maximum stretch is the maximum of the stretches of all jobs2_and

provides a measure of the responsiveness of the system: it quantifies the user expectation that the response time should be proportional to the load incurred by the execution of the job by the system. For an illustration, consider two jobs released at the same time, one lasting 1 hour and the other 10 hours. With a single processor, executing the long job first leads to a maximum stretch of 11, while executing the short job first leads to a maximum stretch of 1.1. Intuitively, the latter scheduling is more fair to users than the former. A word of caution: assume now that we have one edge processor and one cloud processor for the two jobs. For each of them, the time that it would spend in a dedicated system

1_{Minimizing the average response time of all jobs, or equivalently the sum of the response}

times of all jobs (also called total flow time), has been extensively studied too [8].

(7)

must be computed as the minimum of its execution times on the edge and on the cloud (including transfers), if it was the only job.

In this paper, we focus on the maximum stretch as the optimization metric, and we investigate how to minimize the maximum stretch of independent jobs executing on an edge-cloud platform. This is an online scheduling problem, as the jobs have release dates that are unknown until the jobs are submitted. The first task is to design a realistic model for such a platform. In a nutshell, we propose a model with preemption and possible re-execution, but no migration: jobs are either executed on the edge processor where they originate from, or on a cloud processor, and can be dynamically re-assigned to another resource. If this is case, the execution must then restart from scratch, thus the time spent executing the job until re-assignment is lost. Note that migration would require some kind of checkpoint mechanism to be able to restart the job on another resource from its current state.

If a job is executed on the cloud, up and down transfers are accounted for. We allow for computation and communication overlap, and many communi-cations can occur in parallel across edge and cloud processors. However, we enforce the one-port full-duplex communication model that states that any re-source cannot be involved either in two sending operations, or in two receiving operations, at the same-time step, but sending and receiving simultaneously is allowed. Independent sender/receiver pairs can communicate in parallel. A given message may be preempted (and resumed later) if the sender or the re-ceiver has a more urgent message to process (because of the release of a new job). This communication model [16, 18, 32, 33] has been advocated as being much more accurate than the traditional macro-dataflow communication model of the scheduling literature, which allows a processor to, say, send 100 different messages in parallel without accounting for any bandwidth limitation.

After designing the model, we assess complexity results. The offline problem is shown to be NP-hard, even in the simpler instance with no release dates (all jobs are known in advance). This result comes from the proof that minimizing the max-stretch with homogeneous processors and without release dates is NP-complete, a problem whose complexity was left open before this work. Given these negative (but somewhat expected) results, we introduce several heuristics for the general online problem, and compare their performance through simula-tions. In summary, the main contributions of this paper are the following:

• A realistic model for scheduling jobs on edge-cloud platforms;

• Several complexity results for the offline problem, proving in particular the NP-hardness of an open problem;

• New heuristics and their experimental comparison for the online problem. The rest of the paper is organized as follows. We first discuss related work in Section 2. The detailed model of the edge-cloud platform is provided in Section 3, along with the MinMaxStretch-EdgeCloud problem definition.

(8)

We then prove the NP-completeness of MinMaxStretch-EdgeCloud in tion 4, and we design several heuristic algorithms to solve this problem in Sec-tion 5. Finally, extensive simulaSec-tions are presented in SecSec-tion 6 to evaluate and compare the algorithms. Conclusions and directions for future work are presented in Section 7.

2 Related work

We discuss related work along three topics: edge-cloud platforms (Section 2.1), communication models (Section 2.2), and maximum stretch (Section 2.3).

2.1 Edge-cloud platforms

A range of applications of edge computing is given in [7], and these include e-health, disaster recovery, autonomous vehicles or flying drones. Although no specific model is given, a state of the art is established with a list of objec-tives and constraints that have been studied, such as the delay, the bandwidth, the energy, QoS-assurance, etc. Some challenges of edge computing are also discussed in [22].

Some papers influenced our choices related to the model. For example, we chose a model that allows preemption but not migration. This was influenced by papers such as [1], where it is shown that allowing migration does not affect the efficiency of the algorithms that minimize the average flow time, whereas al-lowing preemption does. This is also proven for the minimization of the average stretch in [2].

We also reviewed papers specifically to design our edge/cloud model. For example, in [25], even though the concern is not exactly the same as ours, a model is proposed for mobile-edge computing systems. Our model is simpler than theirs because we do not consider the power consumption, whereas they do. Existing models include those from [30] and [29], where the focus is on the placement of data stream processing applications. However, this latter approach is application-specific, which our model aspires to be agnostic of the nature of the application.

Finally, we decided to consider heterogeneous edge processors but kept ho-mogeneity inside the cloud platform. This decision is motivated by the fact that users can always request identical virtual machines on cloud platforms. On the theoretical side, dealing with heterogeneous processors might be an obstacle to finding competitive algorithms: in some cases [15], having heterogeneous pro-cessors invalidates the theoretical bounds of some usual scheduling algorithms. However, it is not difficult to extend our model with heterogeneous cloud pro-cessors, and all the algorithms can be modified in a straightforward manner to handle a fully heterogeneous edge-cloud platform.

(9)

2.2 Communication model

Contention-aware task scheduling has been considered since many years in the scheduling literature [16]. In particular, [32, 33] showed through simulations that taking contention into account is essential for the generation of accurate schedules. They investigate both point and network contention. Here end-point contention refers to the bounded multi-port model [17]: the volume of messages that can be sent/received by a given processor is bounded by the lim-ited capacity of its network card. Network contention refers to the one-port model, which has been advocated by [6] because “current hardware and soft-ware do not easily enable multiple messages to be transmitted simultaneously.” Even if non-blocking multi-threaded communication libraries allow for initiating multiple send and receive operations, all these operations “are eventually serial-ized by the single hardware port to the network.” Experimental evidence of this fact has been reported by [31], which reports that asynchronous sends become serialized as soon as message sizes exceed a few megabytes. The one-port model fully accounts for end-point contention. Coupled with message preemption, it provides a very realistic model for communications.

2.3 Stretch

The stretch is a particular case of weighted response time [11] and has been studied extensively because of its fairness. Here is a quote from [5]: “Flow time measures the time that a job is in the system regardless of the service it requests; the stretch measure relies on the intuition that a job that requires a long service time must be prepared to wait longer than jobs that require small service times.” An interesting analogy is made in [13] between maximum stretch and distributive justice, defined as the perceived fairness in the way costs and rewards are shared within a group of individuals.

Stretch optimization was studied for independent jobs without preemption [4], bag-of-tasks applications [24, 10], multiple parallel task graphs [9], and for shar-ing broadcast bandwidth between client requests [34].

On the algorithmic side, [26] proposes an algorithm called Shortest Remain-ing ProcessRemain-ing Time (SRPT), that is an approximation for the online case with the average stretch as an objective. It is a greedy approach. Then [3] pro-poses an algorithm for minimizing the maximum stretch in the online case of only one processor. This algorithm is optimal for the offline problem, and is ∆-competitive for the online problem, where ∆ is the ratio between the longest and the shortest job. Another version of this algorithm in given in [4], with a better time complexity but similar bounds on the resulting stretch.

(10)

3 Model

We introduce the framework in Section 3.1, we survey schedule constraints in Section 3.2, and we work out an example in Section 3.3.

3.1 Framework

We consider a two-level platform, with Pc _{homogeneous processors in a cloud,}

and Pe _{edge computing units. The cloud processors have a speed normalized}

to 1, while edge computing units run at a slower pace. Hence, the j-th edge computing unit, with 1_{≤ j ≤ P}e_{, is operating at a speed s}

j ≤ 1.

The application consists in n independent jobs J1, . . . , Jn, where each job is

issued from an edge computing unit. Job Jiis generated by the edge computing

unit oi(with 1≤ oi≤ Pe), and this edge computing unit must obtain the result

of job Ji, either by executing Ji locally, or by delegating the job to the cloud,

and getting the result back. Parameters for job Ji are as follows:

• oi is the origin processor on the edge ;

• wi is the number of operations required by the job;

• ri is the release date;

• upi and dni are the communication times required to send the job to the

cloud and get the result back (uplink/downlink communications). The execution time of job Ji hence depends whether it is executed directly

at the edge processing unit where the job originates from, or whether it is sent for an execution on the cloud. If the job is executed on the edge, it takes a time te

i =

wi

s_oi, since the edge processing unit oi operates at speed soi. Computation

may be faster on the cloud, but we must then account for communication time, hence an execution on the cloud takes a time tc

i = upi+wi+dni(send the job up

to the cloud, compute at speed 1, and get the result back down). In both cases, the execution of the job can be preempted, which means that we can interrupt its execution (to schedule another job), and resume it at a later time. Once started on a given resource, the execution cannot migrate to another resource, but re-execution from scratch is possible (and the time spent up to re-execution is lost).

Let fidenote the time at which the execution of Jiis completed. Then, the

stretch Si of job Ji is defined by:

Si= fi− ri min(te i, t c i) .

Indeed, job Ji is released at time ri and spends a time fi− ri in the system,

while it could have taken a time min(te

i, t

c

i) in the best case of a dedicated

system. Hence, the stretch of each job is equal to one if the job is executed with the minimum possible time, and we want the maximum stretch (over all jobs)

(11)

to be as close to one as possible. Overall, the objective function is to minimize the maximum stretch, which is defined as max1≤i≤nSi.

While several jobs may be competing for cloud resources, several constraints must be enforced within a schedule of jobs. We overlap computations and com-munications, and we consider communication channels to be full-duplex between an edge processor and a cloud processor, hence it is possible to perform in par-allel a computation, an uplink communication, and a downlink communication. However, communications involving a common processor must be sequential-ized: if two jobs originating from the same edge processing unit are delegated to the cloud, we cannot perform the two uplink (and then downlink) commu-nications in parallel. Similarly, if two jobs (that may come from different edge processing units) are sent to the same cloud processor, their communications (uplink and then downlink) must be sequentialized.

Furthermore, communications, just as computations, are preemptive, which means that we can interrupt a communication (for instance, to schedule a com-munication for a smaller job), and resume the interrupted comcom-munication at a later time.

Optimization problem: The goal is to find a schedule that respects all constraints, with the aim of minimizing the maximum stretch. This problem is denoted as MinMaxStretch-EdgeCloud.

3.2 Schedules

For each job Ji, we first need to decide where it is executed: alloc(i) = 0 if

the job is executed on its local edge processor oi, otherwise we set alloc(i) = k,

where k is the cloud processor on which Jiis executed (1≤ k ≤ Pc). Formally, a

schedule consists in sets of disjoint execution intervals Eifor each job (recall that

we allow preemption), and disjoint uplink/downlink communication intervals for jobs executed on the cloud (alloc(i)_{6= 0), U}i(oi, alloc(i)) and Di(alloc(i), oi).

Several constraints must be ensured for the schedule to be valid. In particu-lar, for any two jobs Ji, Ji0 (with 1≤ i, i0≤ n), executed on the same processor,

i.e., alloc(i) = alloc(i0₎

6= 0 (cloud), or alloc(i) = alloc(i0_{) = 0 and o}

i = oi0

(edge), all their execution intervals must be disjoint: ∀I ∈ Ei,∀I0∈ Ei0, I∩ I0=∅.

We must also ensure that communications are serialized, i.e., for any two sets Ui(j, k) and Ui0(j0, k0) (resp. D_i(k, j) and D_i0(k0, j0)), if j = j0 or k = k0,

then the communication originates from or targets the same processor, and hence all intervals must be pairwise disjoint. Finally, we must ensure that all computations and communications are done, in the correct order. Given a set of intervals E, we denote by min(E) (resp. max(E)) the smallest (resp. largest) extremity of all intervals in E. Hence, for job Ji (1≤ i ≤ n):

• If alloc(i) = 0, then PI∈Ei|I| ≥

wi

s_oi;

(12)

– P

I∈Ei|I| ≥ wi;

– P

I∈Ui(oi,alloc(i))|I| ≥ upi;

– P

I∈Di(alloc(i),oi)|I| ≥ dni;

– we must complete uplink communications before starting computa-tion: max(Ui(oi, alloc(i)))≤ min(Ei);

– and we must complete computation before starting downlink com-munications: max(Ei)≤ min(Di(alloc(i), oi)).

3.3 Example

An example of a valid execution with one edge processor and one cloud processor is depicted in Figure 1, where J1, J4and J6are executed on the edge, while J2, J3

and J5 are sent to the cloud. Job parameters are as follows, where the speed of

the edge processor is 1

3: • J1: r1= 0, w1= 1, up1= dn1= 5; • J2: r2= 0, w2= 4, up2= dn2= 2; • J3: r3= 3, w3= 2, up3= 2, dn3= 1; • J4: r4= 5, w4= 4/3, up4= dn4= 5; • J5: r5= 5, w5= 2, up5= 2, dn5= 1; • J6: r6= 6, w6= 1/3, up6= dn6= 5.

The execution intervals, as well as uplink/downlink communication intervals, are depicted. At time-step 6, we compute at the same time on the cloud, on the edge, and we perform an uplink communication and a downlink communication. Also, this is the time when J6 preempts job J4, since it can be executed locally

within one time unit, while J4 is a longer job and will be delayed only by one

time unit.

One can check, by an exhaustive look at all possible schedules, that this one is optimal. Indeed, jobs J1and J6run at their minimum possible time wi×3 on

the edge, while executing them on the cloud would have cost at least 10 units

time Cloud C_{→ E} E_{→ C} Edge J2 J3 J5 J2↓ J3↓ J5↓ J2↑ J3↑ J5↑ J1 J4 J6 J4 1 2 3 4 5 6 7 8 9 10 11 Figure 1: Example of a valid schedule, with Pe_{= P}c _{= 1.}

(13)

of communication time, hence they have a stretch of 1. On the other hand, J2

would take a time 12 on the edge, and it is sent to the cloud where the total time is up2+ w2+ dn2 = 8, hence also a stretch of 1. Next, consider J3 and

J5, which have the same characteristics: they can be executed in time 6 on the

edge, but 5 on the cloud. Here, we execute them on the cloud, where they both experience one time-step of delay (waiting for the processor to be available), hence they have a stretch of 6

5, and they could not have a better stretch on the

edge. Finally, job J4 takes a time 5 on the edge, while its minimum time is 4,

because it is preempted at time-step 6 to execute the shorter job J6that would

greatly impact the stretch if it was delayed. On the cloud, it would have taken a time at least 10 +4

3, hence much greater. Its stretch is 5

4, which determines

the maximum stretch for this example.

Throughout this example, we see that decisions are more difficult to take when there is no knowledge about jobs that will be released in the future. For instance, one could schedule job J3 either on the edge or on the cloud, since

it would complete in both cases within time 9, but depending on the jobs that come next (computation-intensive vs communication-intensive), one decision would be better than the other. The online case is the problem where jobs are not known in advance, while in the offline case, all job parameters are known in advance. In the following, we prove that the problem is difficult even in the offline case, and then we derive heuristics to address the general online problem in Section 5.

4 Problem complexity

This section is devoted to assess the study of the complexity of MinMaxStretch-EdgeCloud in the offline case. To shorten notations, we write MMSECO instead of MinMaxStretch-EdgeCloud-Offline throughout this section, and MMSECO-Dec is the corresponding decision problem (is it possible to achieve a target maximum stretch?). First, we prove that MMSECO-Dec is in NP in Section 4.1. Then, we derive two main complexity results, that remain true even without release dates (consider that all jobs are released at time 0). The results are the following:

• For a fixed number of processors, MMSECO-Dec is NP-complete in the weak sense (Section 4.2).

• For a variable number of processors, MMSECO-Dec is NP-complete in the strong sense (Section 4.3).

4.1 NP membership

Lemma 1. MMSECO-Dec is in NP.

Proof. A solution to MMSECO-Dec is a sequence of events, where an event is defined with the following list:

(14)

• a job Ji being released;

• a job Ji finishing its uplink communication;

• a job Ji finishing its execution;

• a job Ji finishing its downlink communication.

In between two events, there is no reason to reconsider any decision taken, nor to preempt any computation or computation. There are only two events for the jobs executed on edge processors, and four for those executed on a cloud processor, so there are at most 4n events. Therefore, a schedule can be represented in polynomial space: at each of these 4n time-steps, there might be at most one event per processor. The validity of the solution can then be verified in polynomial time by checking each constraint for a valid schedule, as detailed in Section 3.2. Therefore, MMSECO-Dec is in NP.

4.2 Weak NP-completeness with two processors

We are now ready to prove the NP-completeness of MMSECO-Dec. We first focus on minimizing the maximum stretch for a fixed number of homogeneous processors, and without release dates (hence forgetting about the cloud-edge system, called MMSH problem), and the associated decision problem MMSH-Dec. The execution of job Ji hence takes wi time units, and there are no

communications. To the best of our knowledge, the complexity of MMSH has not been established yet, and we prove its NP-completeness. In a second step, we show that it is easy to create an instance of MMSECO-Dec from a general instance of MMSH-Dec, hence proving the NP-completeness of MMSECO-Dec.

As a preliminary, we derive an interesting result about the optimal order of jobs in a schedule for MMSH: with a single processor, jobs should be ordered from the shortest to the longest (i.e., by non-decreasing wi’s).

Lemma 2. For MMSH with a single processor, there always exists a schedule that minimizes the maximum stretch by ordering the jobs from the shortest to the longest, and without preemption.

Proof. First, we observe that preemption is not useful when all jobs have the same release date (which is 0 here). To see this, assume we execute a sequence with preemption (J_i(1), J_j(1), J_i(2)) of three job chunks, where Ji and Jj are two

different jobs. Here, Ji(1)and J (2)

i are two chunks of job Ji, and J (1)

j is one chunk

of job Jj. Swapping the first two chunks leads to the sequence (J (1) j , J (1) i , J (2) i ).

Because Ji and Jj have the same release date, this swap is valid: it does not

change the stretch of Ji and can only decrease or leave unchanged the stretch

of Jj. After a finite number of such swaps, we have a non-preemptive schedule

whose stretch does not exceed that of the original schedule with preemption. Then, to prove that the jobs can be ordered as stated, we consider a schedule that does not order the jobs from the shortest to the longest, and we construct

(15)

another schedule with fewer mis-orderings, and which has a maximum stretch that is smaller than or equal to the original maximum stretch. Let (k1, . . . , kn)

be (the indices of) the ordering of the jobs in the initial schedule, meaning that we execute Jk1 first and Jkn last. By assumption, there exists i such that

wki > wki+1 (i.e., job Jki is longer than job Jki+1). We swap these two jobs,

which is possible because all jobs have same release dates, and which does not change the stretch of the other jobs.

Since we consider the maximum stretch of the schedule, let us compare the maximum between the stretches of the two jobs before and after the swap. If X is the starting time of the earlier job, then the values that we consider are:

Sb= max X + wki wki ,X+ wki+ wki+1 wki+1

before the swap;

Sa= max X+ wki+1 wki+1 ,X+ wki+1+ wki wki

after the swap. As we have X+wki+1

w_ki+1 < Sb and

X+w_ki+1+w_ki

w_ki < Sb, we indeed get Sa < Sb, so

the stretch after the swap is at most the stretch before the swap. We can repeat the operation until there are no mis-orderings anymore, and we get a schedule that orders the jobs from the shortest to the longest and has a maximum stretch less than or equal to that of the initial schedule.

Theorem 1. MMSH-Dec with two homogeneous processors is NP-complete in the weak sense.

Proof. It is easy to see that MMSH-Dec is in NP, since it is a simpler case of MMSECO-Dec (see Section 4.1). We now prove that it is NP-complete.

Let I1be an instance of Partition-Eq [14], which is a special case of

2-Partition where the partitions must have equal cardinality: Given 2n integers {a1, . . . , a2n} with 2S =P

2n

i=1ai, does there exist a subset I∈ {1, . . . , 2n} with

|I| = n andP

i∈Iai= S?

We create an instance I2 of MMSH-Dec with two processors and 2n + 2

jobs, such that:

• for 1 ≤ i ≤ 2n, the execution time of job Jiis wi= nS + ai;

• the execution time of the two additional jobs J2n+1 and J2n+2is w2n+1=

w2n+2= (n + 1)S.

We ask whether it is possible to achieve a stretch of n2+n+2

n+1 .

Note that, building upon Lemma 2, we know that each processor will sort its jobs by non-decreasing execution time in an optimal solution, so a schedule is characterized only by a partition P1, P2 of the jobs. We now show that I1

has a solution if and only if I2 has a solution.

• Consider first that I1has a solution, I. We consider the schedule with P1

(16)

jobs (Ji, i /∈ I and J2n+2). We prove that this schedule has a max-stretch

equal to n2_+n+2

n+1 .

First, note that the sum of job weights in each set is: X

i∈I

(nS + ai) + (n + 1)S = n2S+ S + (n + 1)S = (n2+ n + 2)S,

since there are exactly n jobs in set _{|I| (and the same number of jobs} for P2).

By Lemma 2, job J2n+1 (resp. J2n+2) is executed last in P1 (resp. P2),

since w2n+1 = w2n+2 > wi for all 1≤ i ≤ 2n. Therefore, these additional

jobs both complete at time (n2_{+ n + 2)S, and they have a stretch} n2_+n+2

n+1 .

For any other job Ji, the stretch is Si = knS+X_nS+a_i , where there are k ≤ n

jobs before Ji (including Ji) with a total weight knS + X, where X =

P

j∈Pred(i)aj. Here, Pred(i) denotes the indices of the k jobs that precede

Ji, and we have X≤ S. We have the following inequalities:

(i) knS+X

nS+ai≤

(n2+1)S

nS+ai ≤

n2₊₁

n (constraints on k and X), and

(ii) n2+1

n ≤

n2+n+2

n+1 (for any n≥ 1).

Therefore, job Jihas a stretch Si≤ n

2_+n+2

n+1 , meaning that the max-stretch

of the schedule is n2+n+2

n+1 , and hence I2has a solution.

• Consider now that I2 has a solution: Let P1, P2 be a schedule with

max-stretchn2+n+2

n+1 or less. We prove that P1, P2is a partition of the jobs into

two sets of equal sum and equal size, with J2n+1 ∈ P1and J2n+2∈ P2.

We first prove that each processor finishes its whole execution at a time

tf inish such that tf inish≤ (n2+ n + 2)S. Let Ji be the last job to finish,

its stretch is Si = tf inish

wi . Since wi ≤ (n + 1)S for all jobs, Si ≥

tf inish

(n+1)S.

Hence, since P1, P2 is a solution to I2, it means that Si ≤ n

2_+n+2

n+1 and

therefore tf inish ≤ (n2+ n + 2)S. Furthermore, since the sum of all

execution times is 2(n2_{+ n + 2)S, this means that both processors finish}

at exactly time tf inish= (n2+ n + 2)S.

Now, we specifically look at the two jobs that finish their execution at time (n2_{+ n + 2)S, one on each processor. We prove that these two jobs}

are J2n+1 and J2n+2. For their stretch to be at most n

2_+n+2

n+1 , they need

to have an execution time wi ≥ (n + 1)S. The only jobs for which this

constraint holds are J2n+1and J2n+2. So, we indeed have J2n+1and J2n+2

that are scheduled on different processors, say J2n+1in P1and J2n+2in P2

(the two jobs are identical).

Let us now consider the jobs in P1, excluding the large job J2n+1. Their

total weight is hence tf inish− (n + 1)S = n × nS + S, which means that

there are exactly n jobs (the ai’s are small in comparison with nS). Since

their sum is n2_S_{+ S, we obtain that} P

i∈P1\{2n+1}ai= S, and the set of

(17)

Overall, I1has a solution if and only if I2has a solution, and the construction

of I2 from I1can obviously be done in polynomial time, hence MMSH-Dec is

NP-complete. This NP-completeness result is established in the weak sense since 2-Partition-Eq can be solved by a pseudo-polynomial algorithm.

We can now derive the result for the original cloud-edge problem, by per-forming a simple reduction from MMSH-Dec.

Corollary 1. MMSECO-Dec is NP-complete in the weak sense, even with one edge processor, one cloud processor, and no release dates.

Proof. By Lemma 1, MMSECO-Dec is in NP. We perform a reduction from MMSH-Dec with two processors, where Pe = Pc = 1, the speed of the edge processor is set to 1, and all communication costs of jobs are set to upi = dni= 0.

The problem is then exactly equivalent to the original instance, which concludes the proof.

4.3 Strong NP-completeness for a variable number of

pro-cessors

Theorem 2. MMSH-Dec with a variable number of homogeneous processors is NP-complete in the strong sense.

Proof. _{We already showed that MMSH-Dec is in NP when proving Theorem 1.} We now prove that it is strongly NP-hard, i.e., it remains NP-hard even if the input parameters are bounded above by a polynomial in p, where p is the number of processors.

Let I1 be an instance of 3-partition [14]: Given 3n integers{a1, . . . , a3n}

whose sum is nB =P3n

i=1ai and with B₄ < ai < B₂ for 1≤ i ≤ 3n, does there

exist a partition of{1, . . . , 3n} into n subsets S1, . . . , Snsuch thatPi∈Siai= B

for 1≤ i ≤ n? We create an instance I2 of MMSH-Dec with n processors and

4n jobs as follows:

• For 1 ≤ i ≤ 3n, the execution time of job Ji is wi= ai;

• For 3n + 1 ≤ i ≤ 4n, the execution time of job Ji is wi = B2; these n

additional jobs are hence larger than any other jobs; • The bound on the max-stretch is set to 3.

We now prove that I1 has a solution if and only if I2 has a solution, and

the solution of I2 always maps exactly one of the additional large jobs on each

processor.

• Consider first that I1 has a solution, {S1, . . . , Sn}, with Pi∈Siai = B.

Given the constraints on the ai’s, there are exactly three elements per

set Si. We consider the schedule with jobs from Si and one additional

job on processor i, hence four jobs per processor. As already stated, jobs are scheduled from the shortest to the longest, and therefore, the k-th job

(18)

always has a stretch that is at most k. Hence, the first three jobs have a stretch at most 3 (equal to three if the three jobs have a weight B

3, and

then the third job has a stretch B

B/3 = 3). The last job finishes at time

3B/2 and it has a length B/2, hence it has a stretch of 3. Finally, each processor has a stretch of 3, hence we have constructed a solution to I2.

• Consider now that I2 has a solution, that is a partition of jobs onto

pro-cessors, _{P1, . . . , Pn}, such that the max-stretch is at most 3. Since the

max-stretch is 3 and all jobs are of length at most B

2, no job can finish

after 3B

2 and only a job of length B/2 can finish at 3B

2 (a smaller job would

lead to a larger stretch). Furthermore, the total work is 3nB

2 and there

are n processors, so each processor finishes exactly at time 3B

2 , with one

of the n large jobs. Consider Si= Pi\ {k | k > 3n}: Si consists in three

jobs of sum B (since all job weights are between B/4 and B/2), hence the Si’s are a solution to I1.

To conclude, I1 has a solution if and only if I2 has a solution, and we have

performed in polynomial time the reduction from 3-Partition that is NP-hard in the strong sense, therefore MMSH-Dec is NP-complete is the strong sense when considering a variable number of homogeneous processors.

Corollary 2. MMSECO-Dec with an arbitrary number of processors is strongly NP-complete.

Proof. By Lemma 1, MMSECO-Dec is in NP. We perform a reduction from MMSH-Dec with p processors, where Pe= 1, Pc = p−1, the speed of the edge processor is set to 1, and all communication costs of jobs are set to upi= dni= 0.

The problem is then exactly equivalent to the original instance, since all jobs originate from the only edge processor, and they can be scheduled either on the edge processor or at one of the cloud processors, and each of these processors (cloud and edge) behave similarly; the platform is hence fully homogeneous.

5 Algorithms

Since MinMaxStretch-EdgeCloud is NP-complete even in the offline case, we design general heuristic algorithms to address the problem in the most gen-eral online setting, with Pe_{different edge computing units generating jobs, that}

can be processed either locally, or on one of the Pc _{cloud processors.}

We first propose a strategy that does not exploit the cloud platform, but executes all jobs locally following a competitive strategy (Section 5.1). We then design a simple greedy algorithm that greedily decides whether to execute a job locally or on the cloud (Section 5.2). Finally, we revisit efficient algorithms from the literature to adapt them to the edge-cloud setting, building upon the shortest remaining processing time of a job in Section 5.3, and on an earliest deadline first strategy in Section 5.4.

(19)

The algorithms are event-based, i.e., we reconsider decisions only when an event occurs, so that they all remain of polynomial cost. There are at most 4n events, corresponding to the time-steps at which job Ji, for 1≤ i ≤ n,

1. is released at edge processor oi (eventEr(i));

2. completes its execution on processor alloc(i) (eventEf(i));

3. completes an uplink communication (event Eup(i));

4. completes a downlink communication (eventEdn(i)).

5.1 Baseline: Edge-Only strategy

We first consider a simple heuristic, Edge-Only, that does not use the cloud platform, but where all jobs are executed locally on the edge. This might actually be a good strategy when edge processing units have a good processing speed and/or when communications are costly.

Since all edge processors are independent with this strategy, the problem consists in minimizing the max-stretch on one processor, and this is done in-dependently at each edge processing unit. We use the Stretch-So-Far Earliest-Deadline-First algorithm, designed by Michael Bender et al. [3]. Indeed, this algorithm is ∆-competitive with a single processor, where ∆ is the ratio between the longest and the shortest job.

Note that we have to modify the algorithm of [3] to account for the edge-cloud framework: when computing the stretch of a job, we consider its potential execution time on the cloud. We will compare heuristics exploiting the cloud with this edge-only strategy in Section 6.

5.2 Greedy

We design a greedy heuristic, called Greedy, that greedily schedules first the job that would currently achieve the highest stretch, as it is the job that might impact most the maximum stretch. Decisions are taken only when an event occurs (new job released, job completion, completion of an uplink or downlink communication).

The algorithm works as follows: at each event, as long as there are available resources, we compute for each job the minimum stretch that can be achieved using an available resource immediately. We select the job that maximizes this value, and execute the job on the resource (edge or cloud) on which it achieves the minimum stretch.

The complexity of the routine that runs at each event is O(n(1 + Pc_)):

compute the stretch for each job, for either a local execution, or the execution on one of the Pc_{cloud computing units. Since there are at most 4n events, the}

worst case complexity is O(n2_Pc_{). Of course, in an online setting, we expect}

the number of jobs that are simultaneously processed in the system to be far less than n. Hence, the average complexity is expected to be much lower.

(20)

time Cloud C_{→ E}2 E2→ C E2 C_{→ E}1 E1→ C E1 J2 J3 J2 J2↓ J2↑ J4↑ J4 J2 J4 J3↓ J3↑ J1 1 2 3 4 5 6 7 8 9

Figure 2: Example of a schedule obtained with Greedy.

We give an example with Pc _{= 1, P}e_{= 2, n = 4, and both edge speeds s =}

0.5. Two jobs are released at time 0, J1on the first edge processing unit E1, and

J2on the second edge processing unit E2. Both jobs have the same parameters:

w= 4, up = 2, and dn = 1. A stretch of 1 can be obtained by scheduling these jobs on the cloud, but since there is only one cloud processing unit, one of the jobs is executed locally, say J1, on the first edge processor E1, with a stretch 8₇.

At time 2, J2completes its uplink communication (eventEup(2)), and a new job

spawns at each processor (J3on E1and J4on E2), with w = 2, up = 2, dn = 1

(events Er(3) andEr(4)). Both of these new jobs have a minimum stretch of 1

with a local execution, and J2still has a minimum stretch of 1 as well. However,

J1has the maximum stretch and it pursues its execution on E1, while J3is sent

on the cloud. J4 begins its execution on E2 expecting a stretch of 1 if it is not

preempted. At time 4, corresponding to eventEup(3), the maximum minimum

stretch is obtained by J3(5₄), which preempts J2 on the cloud. As the cloud is

currently unavailable, J2 takes the priority on E2, thus preempting J4. When

the cloud is available at time 6, J2 resumes its execution on the cloud, as it is

expected to end faster on the cloud. It finishes at time 9 with a stretch of 9

7.

The max-stretch is finally reached by J4, that can resume its execution on E2

after being delayed by J2, and that completes at time 8, with a stretch of 3₂, see

Figure 2. The parts of execution or communication that were not used because of preemption are in grey.

5.3 Shortest Remaining Processing Time

The SRPT heuristic builds on the classical Shortest Remaining Processing Time strategy, which assigns to a processing unit the job that it can finish the earliest, so that shortest jobs will be executed with little delay. Again, decisions are taken at each event.

The incentive for the use of this algorithm is that it is a O(1)-competitive algorithm for the average stretch problem, as shown in [26]. However, this is not the case for the maximum stretch: a long job could be blocked by short jobs for an arbitrary long duration. Note that migration is not allowed, but full re-execution is possible, thus a job that has been preempted by another job

(21)

might start again (from scratch) on another processor, if it becomes the job that will finish first on that other processor.

The algorithm takes new decisions at each event. As long as we still have a processor on which we can execute a job, we choose the job that can be completed the earliest and execute it on the processor that can complete it the earliest, and we remove this job and processor from the list of available jobs/processors. The algorithm is detailed in Algorithm 1, where we handle a priority queue with jobs ordered by release dates (releases), and we keep track of the next release event nextRelease, corresponding to an event Er(i), with

1_{≤ i ≤ n. Also, nextF inish corresponds to the next completion event, and it} can be either the completion of a job, or the completion of a communication. Once the time of the next event is known (tnew), we compute the remaining

work and communication at this time, given that processors have been working and communicating for a time tnew− t. These updates are done through the

procedure execute, which is detailed in Algorithm 2, where initially we

i,k =

we

i = wi, upci = upi and dnci = dni, and then these values may decrease when

some work has already been completed for job i. Note that at the first iteration of the loop, all the arrays executing, upcom and downdom are empty, hence executedoes nothing. We will effectively start scheduling jobs once the first job release has been done, at the first time tnew. Then, we compute all possible

finish times of jobs (on edge or cloud) and decide of the SRPT schedule by calling the SRPT-schedule routine, see Algorithm 3: jobs are sorted by non-increasing finish times, and we assign them in this order to processors (cloud or edge). The current assignment of jobs to processors and communications is then stored in the arrays executing, upcom and downcom, and nextF inish is the next event given this assignment (which communication or computation will complete first).

The complexity of the routine that runs at each event is in O(n(Pe_{+ P}c_)),

and since there are at most 4n events, the worst case complexity is O(n2_(Pe₊

Pc_)).

We consider the same example as in Section 5.2: Pc _{= 1, P}e _{= 2, n = 4,}

and both edge speeds s = 0.5. Jobs J1 and J2 are such that r = 0, w = 4,

up = 2, dn = 1, and o1 = 1, o2 = 2. J3 and J4 are such that r = 2, w = 2,

up= 2, dn = 1, and o3= 1, o4= 2. The schedule returned by SRPT is given by

Figure 3, with max stretch 11

7. When the two first jobs are released at time 0,

their SRPT is reached by sending the jobs to the cloud. However, once J1 has

been sent to the cloud, the uplink communication channel for the cloud is busy and T2 is hence executed locally at E2. The next events occur at time 2, with

the release of new jobs. Remaining processing times are updated with the work already done (upc

1 = 0 since T1 has completed its uplink communication). J3

and J4 have then the shortest remaining processing time for an execution on

the edge, and J4preempts J2. Since the uplink communication channel is now

available, J2 is sent to the cloud. At time 6, J1 has a finish time of 7 and its

downlink communication is first scheduled, while J2would finish the earliest by

(22)

Algorithm 1: SRPTheuristic

Data: Job and platform parameters.

1 releases← P riorityQueue(i, ri) // Jobs ordered by release dates; 2 executing← array(None) // Job currently executed on each processor

1≤ j ≤ Pc_{+ P}e_); 3 upcom← array(None); 4 downcom← array(None); 5 J obs={};

6 t← 0;

7 nextF inish← None; 8 i← releases.pop();

9 nextEvent= nextRelease← (ri,Er(i)) ; 10 while nextEvent6= None do

/* Actualize state of jobs */

11 (tnew, e)← nextEvent; 12 if e=Ef(i) then 13 J obs.remove(i); 14 if e=Er(i) then 15 J obs.insert(i); 16 i0← releases.pop(); 17 nextRelease← (r_i0,E_r(i0));

/* Update remaining work */

18 execute(t, tnew,{wci,k, w e i, up c i, dn c i}i∈J obs,1≤k≤Pc) ;

/* Compute all possible finish times and SRPT schedule */

19 f inishingT imes← List();

20 t← tnew; 21 foreach i∈ Jobs do 22 te_i ← w e i s_oi;

23 f inishingT imes.insert(t + te_i,(i, oi)); 24 foreach pk ∈ Pc do

25 tc_i,k← w_i,kc + upc_i+ dnc_i;

26 f inishingT imes.insert(t + tc_i,(i, Pe+ k)); 27 executing, upcom, downcom, nextF inish←

SRPT-schedule(f inishingT imes);

28 if nextRelease6= None then

29 nextEvent← min (nextRelease, nextF inish);

30 else

(23)

Algorithm 2: Update of remaining work

Data: Two timestamps t1, t2, remaining work and communication at

time t1

Result: Remaining work and communication at time t2 1 execute(t1, t2,{wi,kc , w e i, up c i, dn c i}i∈J obs,1≤k≤Pc) 2 begin 3 foreach j∈ Pe do 4 i← executing[j]; 5 w_ie← we_i − soi(t2− t1); 6 foreach k∈ Pc do 7 i← executing[Pe+ k]; 8 w_i,kc ← w_i,kc − (t2− t1); 9 foreach k∈ Pc do 10 i← upcom[k]; 11 upc_i ← upc_i− (t2− t1); 12 i← downcom[k]; 13 dnc_i ← dnc_i − (t2− t1); time Cloud C_→E2 E2→ C E2 C_→E1 E1→ C E1 J2 J1 J2↓ J2↑ J4 J2 J1↓ J1↑ J3 1 2 3 4 5 6 7 8 9 10 11 Figure 3: Example of a schedule obtained with SRPT.

(24)

Algorithm 3:Scheduling between two events using SRPT

Data: List of tuples (t, (i, j)): t is the remaining duration needed to finish executing job i on processor j

Result: Job allocations to processors and communication channels, and next completion event

1 SRPT-schedule(f inishingT imes) 2 begin

3 executing← array(None) // for

processors 1≤ j ≤ Pe_{+ P}c _;

4 U pCloud, DownCloud← array(None); // for

1≤ k ≤ Pc _;

5 U pEdge, DownEdge← array(None); // for

1_{≤ j ≤ P}e_;

6 nextF inish← (∞, E_f(−1)); 7 sort(f inishingT imes);

8 foreach(t, (i, j))∈ finishingT imes do 9 assigned← false;

10 if executing[j] = N one and i is not in executing and (j≤ Pe or

upi= 0) then 11 executing[j]← i; 12 event= (t,Ef(i)); 13 assigned← true;

14 else

15 if j > Pe and U pCloud[j− Pe] = N one and U pEdge[oi] =

N one and upi 6= 0 then 16 U pCloud[j− Pe] = i; 17 U pEdge[oi] = i; 18 event= (t,Eup(i)); 19 assigned← true;

20 if j > Pe and DownCloud[j− Pe] = N one and

DownEdge[oi] = N one and upi= 0 and wci = 0 then 21 DownCloud[j− Pe] = i;

22 DownEdge[oi] = i; 23 event= (t,Edn(i)); 24 assigned← true; 25 if assigned then

26 nextF inish← min (nextF inish, event); 27 if nextF inish= (∞, Ef(−1)) then

28 nextF inish← None;

(25)

5.4 Stretch-so-far EDF

For one processor in the offline case, the algorithm in [4] finds the optimal maximum stretch in polynomial time. This algorithm was extended to the online case, and obtained an algorithm that is ∆-competitive, where ∆ is the ratio between the longest and the shortest job. The core idea of this algorithm consists in giving deadlines to jobs, based on an estimate of the maximum stretch. This estimate can change over the course of the algorithm and is computed by finding the lowest stretch we can get, through a binary search on the stretch. More precisely, these deadlines are computed as di = ri+ Se× ti, where Se is the

stretch estimate. This estimate is computed as Se= α× S≤k∗ , where S≤k∗ is the

optimal stretch of the jobs that have already been released (computed through binary search), and α is a parameter of the algorithm. Then, the algorithm schedules the jobs by taking the earliest of these deadlines first. With a well chosen α, this algorithm has a competitive ratio that is a function of ∆ = wmax

wmin.

More specifically, for α = 1, it is ∆-competitive, which is optimal up to a constant factor. However, the result can be better if ∆ is known in advance. This algorithm is called Stretch-so-far Earliest-Deadline-First, since it is the algorithm Earliest Deadline First, with, as input, deadlines derived from the optimal stretch so far.

In the original problem, this algorithm relies on the fact that Earliest Dead-line First (EDF ) is optimal with a single processor. However, this is not the case anymore when we consider the MinMaxStretch-EdgeCloud problem, since we need to account for uplink and downlink communications for each job. We provide an example of the non-optimality of EDF with two jobs and one cloud processor:

• J1with up1= 3, w1= 1, dn1= 0, and deadline d1= 5;

• J2with up1= 1, w2= 3, dn2= 0, and deadline d2= 6.

EDF establishes that the job with the highest priority is J1, since d1 <

d2. Therefore, it first sends J1 to the cloud, and hence J2 starts its uplink

communication at time 3 and completes at time 3 + 1 + 3 = 7 > d2. However,

by executing J2 first, J1 is able to also fit its deadline (uplink communication

starting at time 1 while J2 is executed on the cloud, and execution of J1 at

time 4).

Still, we propose an adaptation of this algorithm for MinMaxStretch-EdgeCloud. However, while EDF tells us which job to execute, we now also have to decide on which processor we execute the job with the highest priority. We decide to execute the job on the processor that minimizes the stretch of this job (or, equivalently, its finishing time). This algorithm is called SSF-EDF.

At each event corresponding to the release of a job (eventEr(i)), we compute

deadlines for each of the jobs currently on the platform, accounting for the remaining working time for the job if part of it has already been executed. These deadlines are computed using a target stretch and α = 1, and a binary search is done to try different stretches. Indeed, the goal is to find the best

(26)

possible achievable stretch at the current time, but as already pointed out, the EDF strategy is no longer optimal and we may not get the optimal stretch. In some cases, a stretch might not be achievable, while a smaller stretch would have been possible with another scheduling strategy.

Given a set of deadlines (and hence a target stretch), we assign the job with the smallest deadline on the processor where it completes the earliest (either the origin edge processor of the job, or one of the Pc _{cloud processors), and then}

iterate on other jobs sorted by non-decreasing deadlines. If all deadlines can be respected, a smaller stretch might be considered, and when the binary search is done, we update remaining processing and communication times, similarly to SRPT, with the assignment returned by SSF-EDF until the next event.

The complexity of SSF-EDF algorithm is O(n(1 + Pc_{)) at each event, as for}

the greedy (compute for each job its shortest execution time locally or on one of the cloud processors), but we should re-execute the routine as many times as needed by the binary search. So the actual worst case complexity that we get is O(n2_Pc_log(1

)), where is the relative precision we want to keep on the

estimate of the stretch (the binary search takes a time log(1

)).

If we consider the same example as in Section 5.2 and 5.3, then we find out that SSF-EDF outputs the same schedule as SRPT for this specific example, with the same maximum stretch of 11

7 (see Figure 3).

6 Simulations

In order to test the efficiency of the proposed algorithms and to compare them with the Edge-Only strategy, we implemented and simulated them in differ-ent scenarios, using parameters from real edge-cloud platforms. All heuris-tics are implemented in C++ and the code can be downloaded at github.com/

Redouane-Elghazi/Max-Stretch-Minimization-on-an-Edge-Cloud-Platform. gitfor reproducibility purpose.

6.1 Simulation setup

We consider two main problem instances: instances with random scenarios to experiment with the various parameters, and instances with parameters coming from [23], inspired by a real-life situation.

Random instances: We first consider a platform with: • 20 cloud processors;

• 10 slow edge processors with speed s = 0.1; • 10 fast edge processors with speed s = 0.5.

The jobs are generated using a uniform distribution for the execution and communication times, as well as the release date and the origin processor. Both execution and communication times follow the same distribution. The param-eters of the distribution for communication are tied to the paramparam-eters of the

(27)

distribution for execution, through the notion of Communication/Computation-Ratio (CCR). More precisely, both distributions are chosen so that the ra-tio between their expected values is equal to some value determined in ad-vance. We consider CCRs ranging from 0.1 (compute-intensive scenario) to 10 (communication-intensive scenario).

Kang instances: These instances are meant to be closer to a real life situ-ation, inspired from [23]. They contain different types of edge processors de-pending on whether:

• their computational unit is a GPU or a CPU; • their communication channel is 3G, LTE, or Wi-Fi.

Then, the jobs are created with an execution time and a communication time that is directly inferred from the type of computational unit and the type of communication channel. More precisely, the values are chosen as follows (expected values and speeds according to [23]):

• the execution time follows a normal distribution with mean 6 and relative standard deviation 1

3;

• the uplink communication time follows a normal distribution with mean t and relative standard deviation 1

3, where:

– t= 95 if the communication technology is Wi-Fi; – t= 180 if the communication technology is LTE; – t= 870 if the communication technology is 3G. • the downlink communication time is 0 for all jobs. Also, the speed of an edge processor is:

• 6

81 if the processor computes on a GPU;

• 6

382 if the processor computes on a CPU.

Release dates and load: With both instance types, the distribution of the release dates is chosen to control the load on edge processors, i.e., the average number of jobs originating from the edge processor that are simultaneously in the system. Hence, for a load `, the maximum release date is set to

Pn

i=1wi

`PP e

j=1sj:

the sum of the work over the aggregated speed is the average execution time using all processors; dividing this ratio by, say, ` = 0.1, augments release times by a factor ten, thereby decreasing the load accordingly. By default, we consider a load of 5% (` = 0.05), hence the processors are able to cope with the jobs before new jobs arrive in the system, but we still have some interesting collisions on both the edge and the cloud.

(28)

SRPT Greedy SSF-EDF Edge-Only 102 ₁₀3 ₁₀4 number of jobs 5 10 15 20 25 30 35 maxim um stretch 102 ₁₀3 ₁₀4 number of jobs 1.0 1.2 1.4 1.6 1.8 2.0 maxim um stretch 0 5000 10000 15000 20000 number of jobs 0.0 0.5 1.0 1.5 2.0 2.5 ex ecution time in seconds

Figure 4: Maximum stretch and execution time with a Communica-tion/Computation Ratio of 0.1 (Random instances).

(29)

SRPT Greedy SSF-EDF Edge-Only 102 ₁₀3 ₁₀4 number of jobs 5 10 15 20 25 maxim um stretch 102 ₁₀3 ₁₀4 number of jobs 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 maxim um stretch 0 5000 10000 15000 20000 number of jobs 0.0 0.5 1.0 1.5 2.0 2.5 ex ecution time in seconds

Figure 5: Maximum stretch and execution time with a Communica-tion/Computation Ratio of 1 (Random instances).

(30)

SRPT Greedy SSF-EDF Edge-Only 102 ₁₀3 ₁₀4 number of jobs 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 maxim um stretch 102 ₁₀3 ₁₀4 number of jobs 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 maxim um stretch 0 5000 10000 15000 20000 number of jobs 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ex ecution time in seconds

Figure 6: Maximum stretch and execution time with a Communica-tion/Computation Ratio of 10 (Random instances).

(31)

SRPT Greedy SSF-EDF Edge-Only 10−1 ₁₀0 ₁₀1 communication/computation ratio 5 10 15 20 25 30 maxim um stretch 10−1 ₁₀0 ₁₀1 communication/computation ratio 1.0 1.5 2.0 2.5 3.0 3.5 4.0 maxim um stretch 10−1 ₁₀0 ₁₀1 communication/computation ratio 0.0 0.1 0.2 0.3 0.4 0.5 0.6 ex ecution time in seconds

Figure 7: Maximum stretch and execution time as a function of the CCR, with n= 4000 jobs (Random instances).

(32)

6.2 Simulation results

For each experiment, we report the maximum stretch (with and without the Edge-Onlyheuristic, since it often leads to much higher and out-of-range stretch values), as well as the execution time of each algorithm (time to compute the schedule). Each point corresponds to the average of 1000 instances created with the same parameters.

Random instances: We first run the heuristics on random instances, with three different values of CCR (0.1, 1, and 10), and study the impact of the number of jobs (100 _{≤ n ≤ 20, 000). As seen on Figures 4, 5, and 6, the} proposed heuristics lead to much better stretches than the Edge-Only strategy, in particular when communications are not too costly (CCR of 0.1 or 1). Indeed, when the communication cost increases, it is better to handle most jobs on the edge. SSF-EDF almost always gives the lowest stretch. SRPT is very close to SSF-EDF, it can even become better than SSF-EDF when there are cheap communications (CCR of 0.1). Even though Greedy was designed to minimize the maximum stretch, it always leads to higher stretches compared to the more sophisticated heuristics SRPT and SSF-EDF.

In terms of execution time, SRPT and Greedy have similar small execution times, while SSF-EDF and Edge-Only are significantly slower. Indeed, these latter two heuristics must run a binary search at each new job release to fix deadlines, and hence they take much longer to execute. Overall, all heuristics have an expected execution time that scales linearly and not quadratically as a function of n; only the constants vary.

We also study in more depth the impact of the CCR on Figure 7, where the number of jobs is set to n = 4000, and the CCR varies from 0.1 to 10. This confirms that the new heuristics gain more compared to Edge-Only for small values of CCR, because it is then very useful to send jobs to the cloud. When looking at the other algorithms, SSF-EDF is the best in all scenarios, with SRPT being very close. Their stretch exceeds two only for the largest values of CCR. Greedy is slightly behind, with stretches going up to 3.5. However, execution times are not impacted by the CCR, they remain roughly constant for each algorithm.

Finally, we investigate scenarios with a higher load, with a load ` up to 2, except for Edge-Only, which becomes too costly since all jobs compete on the edge. We focus on the instance with a CCR of 1 and n = 1000 jobs (Figure 8) or n = 4000 (Figure 9). Again, SSF-EDF is clearly the best, and this is even more striking when the load increases, where the stretch of SRPT and Greedy drastically increases while it stays under three for SSF-EDF. It is also interesting to note that Greedy becomes slightly better than SRPT with an increased load, in particular with n = 1000. Execution times follow the same trend as before, with an increase in execution time when the load increases, in particular for SSF-EDF and Greedy; in this case, Greedy becomes as costly as SSF-EDF (around 10 seconds) with a load ` = 2 and n = 4000.

Kang instances: For the Kang instances, the CCR is dictated by the plat-form parameters, so we only vary the number of jobs, in two different scenarios

(33)

SRPT Greedy SSF-EDF 0.0 0.5 1.0 1.5 2.0 load 2 4 6 8 10 12 maxim um stretch 0.0 0.5 1.0 1.5 2.0 load 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ex ecution time in seconds

Figure 8: Maximum stretch and execution time with n = 1000 jobs and a CCR of 1 (Random instances).

(34)

SRPT Greedy SSF-EDF 0.0 0.5 1.0 1.5 2.0 load 5 10 15 20 maxim um stretch 0.0 0.5 1.0 1.5 2.0 load 0 2 4 6 8 10 ex ecution time in seconds

Figure 9: Maximum stretch and execution time with n = 4000 jobs and a CCR of 1 (Random instances).

(35)

SRPT Greedy SSF-EDF Edge-Only 102 ₁₀3 ₁₀4 number of jobs 10 20 30 40 50 60 maxim um stretch 102 ₁₀3 ₁₀4 number of jobs 1 2 3 4 5 6 7 maxim um stretch 0 5000 10000 15000 20000 number of jobs 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 ex ecution time in seconds

Figure 10: Maximum stretch and execution time: Kang instance with 20 mixed edge processors and 10 cloud processors.

(36)

SRPT Greedy SSF-EDF Edge-Only 102 ₁₀3 ₁₀4 number of jobs 10 20 30 40 50 60 maxim um stretch 102 ₁₀3 ₁₀4 number of jobs 1.0 1.5 2.0 2.5 3.0 3.5 4.0 maxim um stretch 0 5000 10000 15000 20000 number of jobs 0 5 10 15 20 25 ex ecution time in seconds

Figure 11: Maximum stretch and execution time: Kang instance with 100 mixed edge processors and 10 cloud processors.

(37)

with 20 or 100 edge processors, while keeping ten cloud processors (see Fig-ures 10 and 11). Again, SRPT and SSF-EDF are clearly the best heuristics, and Edge-Only cannot keep up when the number of jobs increases. It is in-teresting to see that with more edge processors (Figure 11), and hence more competition for cloud resources, Greedy gets a stretch that is close to the ones achieved by SRPT and SSF-EDF.

The execution times are much higher in the scenario with 100 edge pro-cessors, and as before, Greedy and SRPT are much faster than Edge-Only and SSF-EDF.

Summary: Overall, we conclude that it is always a better option to choose SRPT over Greedy for lightly loaded systems, since it always leads to smaller stretches, and has approximately the same time complexity. Greedy however may outperform SRPT when the system is heavily loaded, but then also becomes more costly. Then, choosing between SRPT and SSF-EDF is a matter of trade-off between efficiency and execution time, since SSF-EDF is more costly than SRPT, but also more efficient. Edge-Only is a costly solution that does not exploit the cloud, and a comparison to this strategy highlights the importance of using cloud resources when available, in particular when communication costs are not too important.

7 Conclusion

We have tackled the problem of scheduling independent jobs on an edge-cloud platform, with the goal of minimizing the maximum stretch of jobs. Jobs are produced by edge computing units, and they can be processed either locally (at a slow speed), or delegated to a cloud (at the price of communications to pay). We have designed a general model, accounting for a realistic communication model, and formalizing the scheduling constraints in this particular setting. While establishing problem complexity, we have encountered the problem of minimizing the max-stretch without release dates on a homogeneous platform, whose complexity was left open in the literature; we have proven that it is hard, filling a gap in fundamental scheduling complexity results. The NP-hardness of the problem in the edge-cloud setting is then a corollary of this result.

We designed heuristic algorithms to address the problem in the general on-line setting, and we assessed the performance of these algorithms through sim-ulations, using parameters coming from real-world edge-cloud platforms. The algorithms delegating jobs to the cloud achieve much better stretches than a competitor Edge-Only approach, in particular when communication costs are not too high. The proposed SSF-EDF algorithm always achieves very good stretches. SRPT is also is very efficient, and computes its schedules more rapidly, hence it might be an interesting alternative to SSF-EDF.

As future work, it would be very interesting to derive theoretical bounds for the proposed online algorithms (competitive results), for instance for some specific job distributions. Furthermore, we would like to investigate a more

(38)

complicated framework where cloud processors are not available full-time: in practice, a realistic but intricate framework is to consider that cloud processors may be dynamically requested by other applications at certain time intervals. Acknowledgments: We thank Denis Trystram for several fruitful discussions on edge computing.

References

[1] B. Awerbuch, Y. Azar, S. Leonardi, and O. Regev. Minimizing the flow time without migration. SIAM J. on Computing, 31(5):1370–1382, 2002. [2] L. Becchetti, S. Leonardi, and S. Muthukrishnan. Average stretch without

migration. J. Comput. Syst. Sci., 68:80–95, 02 2004.

[3] M. A. Bender, S. Chakrabarti, and S. Muthukrishnan. Flow and stretch metrics for scheduling continuous job streams. In Proc. of the 9th ACM-SIAM Symp. on Discrete Algorithms, SODA’98, page 270–279, 1998. [4] M. A. Bender, S. Muthukrishnan, and R. Rajaraman. Improved algorithms

for stretch scheduling. In Proc. of the 13th Annual ACM-SIAM Symp. on Discrete Algorithms, SODA’02, page 762–771, USA, 2002.

[5] M. A. Bender, S. Muthukrishnan, and R. Rajaraman. Approximation al-gorithms for average stretch scheduling. Journal of Scheduling, 7:195–222, 2004.

[6] P. Bhat, C. Raghavendra, and V. Prasanna. Efficient collective communi-cation in distributed heterogeneous systems. Journal of Parallel and Dis-tributed Computing, 63:251–263, 2003.

[7] A. Brogi, S. Forti, C. Guerrero, and I. Lera. How to place your apps in the fog: State of the art and open challenges. CoRR, 1901.05717, 2019. [8] P. Brucker. Scheduling Algorithms. Springer-Verlag, 2004.

[9] H. Casanova, F. Desprez, and F. Suter. Minimizing stretch and makespan of multiple parallel task graphs via malleable allocations. In 39th International Conference on Parallel Processing, pages 71–80, 2010.

[10] J. Celaya and L. Marchal. A Fair Decentralized Scheduler for Bag-of-Tasks Applications on Desktop Grids. In 10th IEEE/ACM Int. Conf. on Cluster, Cloud and Grid Computing, pages 538–541, 2010.

[11] C. Chekuri and S. Khanna. Approximation schemes for preemptive weighted flow time. In Proceedings of the Thiry-Fourth Annual ACM Sym-posium on Theory of Computing, STOC ’02, page 297–305, 2002.

[12] P. Chr´etienne, E. G. Coffman Jr., J. K. Lenstra, and Z. Liu, editors. Scheduling Theory and its Applications. John Wiley and Sons, 1995.