Disponible à / Available at permalink :

(1)

- - -

Dépôt Institutionnel de l’Université libre de Bruxelles / Université libre de Bruxelles Institutional Repository

Thèse de doctorat/ PhD Thesis Citation APA:

Berten, V. (2007). Stochastic approach to Brokering heuristics for computational grids (Unpublished doctoral dissertation). Université libre de Bruxelles, Faculté des Sciences – Informatique, Bruxelles.

Disponible à / Available at permalink : https://dipot.ulb.ac.be/dspace/bitstream/2013/210707/4/31518d53-cb07-4a2a-82e8-2e89b820024c.txt

(English version below)

Cette thèse de doctorat a été numérisée par l’Université libre de Bruxelles. L’auteur qui s’opposerait à sa mise en ligne dans DI-fusion est invité à prendre contact avec l’Université (di-fusion@ulb.be).

Dans le cas où une version électronique native de la thèse existe, l’Université ne peut garantir que la présente version numérisée soit identique à la version électronique native, ni qu’elle soit la version officielle définitive de la thèse.

DI-fusion, le Dépôt Institutionnel de l’Université libre de Bruxelles, recueille la production scientifique de l’Université, mise à disposition en libre accès autant que possible. Les œuvres accessibles dans DI-fusion sont protégées par la législation belge relative aux droits d'auteur et aux droits voisins. Toute personne peut, sans avoir à demander l’autorisation de l’auteur ou de l’ayant-droit, à des fins d’usage privé ou à des fins d’illustration de l’enseignement ou de recherche scientifique, dans la mesure justifiée par le but non lucratif poursuivi, lire, télécharger ou reproduire sur papier ou sur tout autre support, les articles ou des fragments d’autres œuvres, disponibles dans DI-fusion, pour autant que :

Le nom des auteurs, le titre et la référence bibliographique complète soient cités;

L’identifiant unique attribué aux métadonnées dans DI-fusion (permalink) soit indiqué;

Le contenu ne soit pas modifié.

L’œuvre ne peut être stockée dans une autre base de données dans le but d’y donner accès ; l’identifiant unique (permalink) indiqué ci-dessus doit toujours être utilisé pour donner accès à l’œuvre. Toute autre utilisation non mentionnée ci-dessus nécessite l’autorisation de l’auteur de l’œuvre ou de l’ayant droit.

--- English Version ---

This Ph.D. thesis has been digitized by Université libre de Bruxelles. The author who would disagree on its online availability in DI-fusion is invited to contact the University (di-fusion@ulb.be).

If a native electronic version of the thesis exists, the University can guarantee neither that the present digitized version is identical to the native electronic version, nor that it is the definitive official version of the thesis.

DI-fusion is the Institutional Repository of Université libre de Bruxelles; it collects the research output of the University, available on open access as much as possible. The works included in DI-fusion are protected by the Belgian legislation relating to authors’ rights and neighbouring rights.

Any user may, without prior permission from the authors or copyright owners, for private usage or for educational or scientific research purposes, to the extent justified by the non-profit activity, read, download or reproduce on paper or on any other media, the articles or fragments of other works, available in DI-fusion, provided:

The authors, title and full bibliographic details are credited in any copy;

The unique identifier (permalink) for the original metadata page in DI-fusion is indicated;

The content is not changed in any way.

It is not permitted to store the work in another database in order to provide access to it; the unique identifier (permalink) indicated above must always be used to provide access to the work. Any other use not mentioned above requires the authors’ or copyright owners’ permission.

(2)

ULB

FACULTÉ DES SCIENCES

Stochastic Approach to Brokering Heuristics for Computational Grids

Thèse présentée en vue de l'obtention du grade de Docteur en Sciences

Directeur de thèse ; Joël G

oossens

Université Libre de Bruxelles

r

Vandy BERTEN

Année académique 2006-2007

(3)

Thèse soutenue publiquement

à Bruxelles, le 8 juin 2007.

(4)

(5)

3 Remerciements

Un travail d'une telle ampleur n'est jamais une réalisation personnelle. Même si une seule personne écrit son nom sur la couverture, il est très loin d'en être le seul auteur. Je ne pouvais donc pas commencer cet ouvrage sans remercier ceux qui, autant que moi- même, en sont à la base.

Parmi tous les participants à ces quatre années de labeur, il en est certainement un qui y a contribué plus que tout autre ; il s'agit bien sûr de Joël

GOOSSENS,

mon directeur de thèse. Il a choisi de me faire confiance il y a près de 5 ans, en acceptant tout d'abord de diriger mon mémoire de maîtrise, et ensuite ma thèse de doctorat. Nos rencontres hebdomadaires, ses lectures régulières de mes écrits même lorsque que je m'éloignais de ses principaux sujets de recherche, son insistance à me faire rencontrer des chercheurs étrangers, sa rigueur et son expérience ont dirigé mes travaux dans une direction qui, je l'espère, s'est montrée digne de la confiance qu'il a mise dans son premier "thésard" que j'ai eu le privilège d'incarner.

Quelques mois après le début de ma thèse, j'ai eu la chance de rencontrer l'équipe d'AlGorille,

à

Nancy, et plus particulièrement Emmanuel

JEANNOT,

qui m'a accueilli

à

plusieurs reprises au cours de ces dernières années. Nos discussions diverses et variées m'ont permis d'étendre mes centres d'intérêts, et je ne peux que lui en être reconnaissant.

Par la suite, j'ai eu le plaisir de collaborer avec quelques chercheurs de l'IMAG de Grenoble, et plus spécialement avec Bruno

Gaujal, à

qui je dois les idées

à

la base de la deuxième partie de ce travail. Il m'a fait l'honneur de m'accueillir à plusieurs reprises, pour im total de près de quatre mois. Outre le délice de la vie au pied des montagnes, ces séjours dans son équipe m'ont énormément appris sur le monde de la recherche et de la collaboration. Je voudrais également remercier Jean-Marc

VINCENT

et Jérôme

VIENNE,

pour leur outil de simulation parfaite, qu'ils ont

à

de nombreuses reprises adapté pour les besoins de nos expériences. C'est par aiUeurs grâce

à

Bruno que j'ai pu accéder

à

Grid'5000, et y consommer près de 40000 heures de processeur.

Toute ma gratitude va également à Raymond D

evillers

, pour ses nombreuses relec

tures méticuleuses de mes écrits, et toutes les discussions passionnantes qui ont suivi.

Comme peuvent en témoigner de nombreux membres du département, il a de toute

(6)

évidence fortement contribué à la rigueur scientifique, à la précision, à la prise en compte des cas limites. Il y a la thèse avant DEVILLERS, et après D

evillers

...

Je me dois certainement de remercier tous ceux qui, outre MM. GOOSSENS, JEAN- NOT, G

aujal

et D

evillers

, ont accepté de faire partie du jury: Olivier M

arkowitch

et Guy L

ouchard

de l'ULB, et Pierre M

anneback

des FPMs. Je remercie plus partic

ulièrement M. L

ouchard

, à qui je dois en grande partie la démonstration de l'annexe B de ce travail.

En dehors du milieu académique, il va sans dire que je dois beaucoup — pour ne pas dire tout — à mes parents, mes deux frères et ma soeur ; la curiosité, le goût de la découverte, l'intérêt pour les sciences, la recherche de la précision, le plaisir du savoir partagé, sont autant de valeurs que je leur dois, et sans lesquelles ce travail n'aurait pas pu aboutir.

Je n'oublie bien sûr par non plus ceux qui m'ont entouré ces dernières années;

bémelois, colonnards, cousins, taizéens, membres du DI, et bien d'autres encore !

(7)

CONTENTS 5

1 Introduction to Grid Brokering 13

1.1 Motivations and Context... 14

1.1.1 Outline... 16

1.2 Définitions and Grid Modeling ... 17

1.2.1 Queuing Model... 19

1.2.2 Scheduling Model... 25

1.2.3 System Load... 28

1.2.4 System State ... 30

1.2.5 Underlying Markov Chain... 31

1.3 Brokering ... 33

1.3.1 Open-loop and Closed-loop... 33

1.3.2 Memoryless and Historical Information... 34

1.3.3 Deterministic and Probabilistic... 34

1.3.4 Mathematical Model ... 35

Probabilistic Memoryless Open-loop Brokering... 35

Deterministic Memoryless Closed-loop Brokering... 36

1.3.5 Bernoulli Brokerings... 36

1.3.6 LCB Policies... 38

1.4 Examples... 41

1.4.1 EGEE... 41

1.4.2 NorduGrid... 43

14.3 Grid'5000 ... 43

1.4.4 GridBus... 44

1.5 Cost Eunction... 45

1.6 Concepts... 48

1.6.1 Resolving Markov Decision Processes Using Dynamic Programming . . 48

Minimizing the Policy... 50

Value Itération... 50

(8)

Policy Itération ... 51

1.6.2 Perfect Simulation... 51

2 Random Brokering 53 2.1 Introduction and Model... 54

2.1.1 Dispatching the Jobs... 55

2.1.2 Numerical Simulations ... 56

2.2 Sequential Systems... 58

2.2.1 System Load... 58

2.2.2 Queue Size... 59

Case !/ = 1 59

Case U <\ 60

Case V > 1 64

Arrivais... 64

Departures... 65

Number of Jobs in the System...67

Experimental Results... 69

2.2.3 Used CPUs ... 74

2.2.4 Résorption Time... 75

2.2.5 Slowdown... 76

Job Length Distributions... 77

Case f < 1 79

Case > 1 81

Slov\/down for a Job Submitted at Time 9...82

Average Slowdown Until the Last Finished Job With u > 1. 85 Experimental Results... 89

2.3 Fully-synchronous Parallel Systems... 93

2.3.1 Queue Size... 95

Case V < ûi... 95

Case V > ùi... 95

Experimental Results... 97

2.3.2 Used CPUs ... 99

Case v < ûi... 99

Case V > ï>i... 99

2.3.3 Résorption Time... 102

2.3.4 Slowdown... 103

Experimental Results...106

2.4 Conclusion... 108

(9)

CONTENTS 7

2.4.1 Summary of Contribution...108

Queue Size... 108

Average Number of Used CPUs... 109

Résorption Time... 109

Measured Slowdown... 109

3 Index Based Brokering 111 3.1 Introduction...112

3.1.1 Mathematical model ... 114

3.1.2 Optimal Brokering... 115

3.1.3 Intuitive Justification of the Index Strategy ... 117

3.1.4 Cost Optimization Problem...122

3.1.5 Mathematical Justification of the Whittle-Gittins Index...123

3.2 Threshold Policy on a Single Queue with Rejection ... 127

3.2.1 Mathematical Formulation ...131

3.2.2 Properties of Optimal Policy...133

3.2.3 Computing the Optimal Threshold... 141

3.3 Algorithmic Improvements... 144

3.3.1 Algebraic Simplifications ...144

Local Gain...144

Admissibility Condition... 144

Total Gain...145

3.3.2 Improvement of the Admissibility Check...146

3.3.3 Reducing the Problem Size...147

3.3.4 Computing the Index Function...149

3.3.5 Parameters Dependence and Numerical Issues... 150

Discount Cost (a)... 151

Arrivai Rate (A)... 152

Number of Servers (s) ...152

© Précision (e) ... 153

3.4 Complexity and Benchmarks ... 155

3.4.1 Value-Determination Operation (Solving Jg)... 155

3.4.2 Policy-Improvement Routine...155

Maximal Complexity... 155

Average Complexity... 155

3.4.3 Finding 0{R)...156

Maximal Complexity... 156

Average Complexity... 156

(10)

Improving the Maximal Complexity... 156

3.4.4 Dichotomy... 157

First Phase ... 158

Second Phase... 158

Improved Maximal Complexity... 158

3.4.5 Space Complexity... 159

3.4.6 Benchmarks... 159

3.5 Numerical Experiments... 161

3.5.1 Strategies... 161

3.5.2 Uniprocessor Systems...162

3.5.3 Multiprocessors Systems ...164

3.5.4 Robustness... 164

3.5.5 Sojourn Time Distribution ...169

3.6 Realistic Experiments... 172

3.6.1 SimGrid Software... 172

3.6.2 A Grid Mode! Using SimGrid...172

3.6.3 Traces... 174

3.6.4 Experimental Scénarios... 175

Several Inputs... 176

The Effect of Fleterogeneity...177

Information Delays ... 178

3.6.5 Sojourn Time Distribution ...180

3.7 Conclusions ... 181

4 Index Brokering of Batch Jobs 183 4.1 Introduction... 184

4.2 Batch Arrivais with Known Distribution... 185

4.2.1 Mathematical Model ... 185

4.2.2 Threshold Policy: Différences with Sequential Jobs...186

Properties of Optimal Threshold Policy ... 188

4.2.3 Computing the Optimal Threshold: Algorithm and Optimizations . . . 190

Algebraic Simplifications ... 191

Local Gain... 191

Admissibility Condition... 191

Global Gain...191

Improvement of the Admissibility Check... 192

Reducing the Problem Size... 193

4.2.4 Computing the Index Function...194

(11)

CONTENTS 9

4.2.5 Complexity... 194

Value-Determination Operation (Solving Jg)... 194

Policy-lmprovement Routine ...194

Finding 6(iî)...194

Dichotomy...195

Benchmarks...196

4.2.6 Numerical Experiments...197

Impact of the Architecture...197

Impact of the Job Width Distribution ... 198

Robustness on Load Variations...199

Robustness on Job Width Distribution...200

4.3 Batch Arrivais with Known Sizes... 203

4.3.1 Mathematical Model ... 203

4.3.2 Bellman’s Equation... 203

4.3.3 Algorithms...206

Computing the Best Thresholds ...206

Computing the Index... 207

4.3.4 Simulations...208

4.4 Batch Arrivais with Synchronous Departures...214

4.4.1 State Transitions... 215

Job Arrivai...215

Process Departure... 215

4.4.2 Alternative Formulation... 216

Case X > 0...216

Case X = 0...216

4.4.3 Bellman's Equation... 216

4.4.4 Algorithm...217

4.5 Parallelization... 219

4.5.1 Interval Division...220

Avoiding Duplication... 222

Improving the Interval Division...222

4.5.2 Pool of Rejection Cost Values ...223

Computing Processes... 223

Coordinator Process... 224

Pool of Values...224

Distributed Dichotomy... 225

Cleaning the Pool... 225

(12)

Filling the Pool... 226

4.6 Conclusions ...228

5 Contribution, Future Works and Conclusion 229 5.1 Summary of Contribution... 229

5.1.1 Random Brokering ... 229

5.1.2 Index Brokering... 230

5.2 Open Questions and Future Work ... 232

5.2.1 Random Brokering (Chapter 2)... 232

Standard Déviation and Error... 232

Missing Expérimentations... 232

Other Distributions... 233

Non-Saturated Parallel Systems... 233

Asynchronous and Semi-Synchronous Systems...233

Slowdown on Submitted Jobs... 233

Missing Formai Proofs ... 233

5.2.2 Index Brokering (Chapter 3) ... 234

Alternative Cost Strategy... 234

Advanced Realistic Simulations... 235

Implémentation in Real/Production Environment...235

Proof of Conjecture 3.12... 235

5.2.3 Batch Jobs (Chapter 4)... 235

Improvement of BatchK Algorithm... 235

Optimal Strategy...236

Semi-Synchronous Systems (BatchS)... 236

Parallelization Implémentation... 236

Full Proof of Convexity... 236

5.3 Final Conclusion... 237

A Splitting of Stochastic Process 241 B Used CPUs: Parallel Case 243 B.l Case s = 2... 243

B.2 Case s = 3... 245

B.3 General Case... 246

B.4 Worst Case ...249

B.5 Equidistributed Case 251

(13)

CONTENTS 11

C Convexity 259

C.l Sequential Case... 262

C.1.1 Case x< B-2... 263

C.1.2 Case x= B-2... 265

C.2 Batch Case... 266

C.2.1 Case x < B — 2... 266

C.2.2 Case x= B-2... 268

C.2.3 Case x = B-l... 269

C.2.4 Case B<x<B + K-3 ... 270

D Solving a K

+

2-diagonal System 271 References 273 Webography... 273

Personnal Bibliography... 273

General Bibliography... 274

Symbols 281

Index 284

(14)

(15)

13

Chapter 1 Introduction to Grid Brokering

Chapter Abstract. This chapter introduces the framework in which our work has been nndertaken. We présent general définitions about grid computing, and more specifically about grid brokering. We establish our mathematical model of computational grid and give the definihons needed for the following of this document.

Chapter Contents

1.1 Motivations and Context 1.2 Définitions and Grid Modeling 1.3 Brokering

1.4 Examples

1.5 Cost Function

1.6 Concepts

(16)

1.1 Motivations and Context

When a scientist needs to perform some small computations, he typically launches a ded- icated application on a personal computer, which gives the required results. We might say that a user sends a job to a server, and gets back the results of its computation. This model applies when the need of computational power is not too high. Once the ma

chine usage reaches the maximal power of the used computer, we may consider two ways of tackling the increasing need of computational power. First, buy a new machine, with better performances, more powerful processors, or more processors (e.g. Massively Parallel Processors). Of course, this is limitted by the speed engineers improve CPU per

formances or memory accesses. The second way consists in gathering several machines, making them working together.

Infrastructures regrouping several (often homogeneous) machines such as simple per

sonal computers, with some coordination mechanisms, are generally named clusters. Ac- cording to Buyya [30]:

a cluster is a type of parallel or distributed processing System, which consists of a collection of interconnected stand-alone computers working together as a single, integrated computing resource.

Clusters are generally owned by organisations such as labs, universities, or industries, and are shared by several users. They contain between a few and several thousands of machines. The network interconnecting cluster machines is usually a high performance LAN (Local Area Network), which allows fast communications and data transfers.

Even if there are no theoretical limitations in terms of the number of machines, clus

ters hâve mainly two disadvantages: first, large clusters are very expensive, and very difficult to manage. Secondly, following the évolution of the experiments for which the cluster has been set up, such a parallel System is often imderused at some periods, and saturated at other ones. Those reasons, as well as many others, lead scientists to the idea of Grids: if computers can be put together in order to form a cluster, clusters could be put together in order to form some kind of cluster of clusters, or meta-cluster, or Grid.

A Grid is typically an infrastructure coordinating several clusters spread around a country, or around the world, hosted by partners of a common project, and communi- cating through a WAN (Wide Area Network), such as the Internet, or some rented links.

As grids are often scientific applications oriented, they generally do not only contain

clusters, but also mass storage devices allowing to store experimental results, or various

kinds of data. Storing huge amount of data in grid Systems requires to solve transfer,

duplicaHon, security, scheduling or other management issues. Due to the slowness and

(17)

1.1 Motivations and Context 15

the unreliable nature of WAN communication channels, those fields are often far more difficult that what they were in clusters, with fast and contrôlable networks.

Grids hâve become prévalent infrastructures for intensive computational tasks. The Word "Grid" is probably now one of the top five terms in computer science, and the whole scientific community agréés that physicists, biologists, or even mathematicians, will not be able to tackle tomorrow's problems without such computational Systems.

But what is exactly a Grid? lan Foster, one of the so called "fathers of the grid" [39], describes on its personal webpage a Grid as:

a System that coordinates resources that are not subject to centralized con- trol, using standard, open, general-purpose protocols and interfaces, to de- liver nontrivial qualities of service.

With that définition, a lot of distributed Systems correspond to grids. In this work, we are mainly going to focus on Grids as being a set of resources, such as clusters, mas- sively parallel processors machines, or mass storage devices, linked together through some wide area network, and managed at a high level by some middleware.

There are mainly two kinds of grids. The first one concems grids in which the large majority of the work is number cruncher (or CPU bound), and does not require neither large amount of data, nor heavy network commimication. This kind of grids is usually called Computational Grids. The second class regroups Systems where data is the center of everything; jobs are data intensive (or lO-bound). They require to store huge amount of data (often Petabytes), which, of course, need regularly to be transferred between com

putational or storage resources. These grids are named Data Grids. This work mainly focuses on the first family, Computational Grids.

Idealistically, Grids should be as easy to use as a simple computer. In the same way than Internet is as easy to use as if the data were locally available, or the electric power grid — in which Grids name has its origin — is as easy to access as if the electrical central was just behind the wall.

Nevertheless, if Grids are on the way of being efficient and user-friendly Systems, computer scientists and engineers still hâve a huge amount of work to do in order to improve their efficiency. Amongst a large number of problems to solve or to improve upon, the problem of scheduling the work and balancing the load is of first importance.

The main subject of this thesis will be this last problem: how to efficiently or fairly

distribute the work across a Computational Grid? This task is usually called the Meta-

Scheduling — in opposition to Scheduling, which treats of the same subject at a local

level — or Brokering. Even if some authors consider the brokering task as not being

primordial, we will show that the brokering policy can drastically change the System

performances, and not necessarily in an expected way.

(18)

1.1.1 Outline

This work will be mainly split in two parts. After introducing the mathematical frame- work on which the following of the manuscript is based, we will in Chapter 2 study Sys

tems where the grid brokering is done without any feed-back information, i.e. without knowing the current State of the clusters when the resource broker — the grid component performing the brokering — makes its decision. We show here how a computational grid behaves if the brokering is done in such a way that each cluster receives a quantity of work proportional to its computational capacity.

This part is based on several publications, in collaboration with Joël GOOSSENS (ULB) and Emmanuel JEANNOT (Loria, Nancy): my DEA thesis [15], a conférence paper resum- ing my DEA thesis (1SPA04, Hong-Kong) [19], a journal paper extending our model to heterogeneous Systems (IEEE Transactions on Parallel and Distributed Systems, 2006) [22], as well as some technical reports (INRIA and ULB) [21, 20]. Notice that, with the same authors, we hâve also a contribution in fault tolérant Real-Time Systems [23], but as this research is out of the scope of the présent work, we hâve not included it here.

The second part of this work (Chapters 3 and 4) is rather independent from the first one, and consists in the présentation of a brokering strategy, based on Whittle's indices, trying to minimize as much as possible the average sojoum time. We show how efficient the proposed strategy is for computational grids, compared to the ones popularly used in production Systems. We also show its robustness to several parameter changes, and provide several very efficient algorithms allowing to make the required computations for this index policy.

This second part is the fruit of a collaboration with Bruno G

au

JAL (IMAG, Grenoble).

On this subject, started by my 3 months sojoum at IMAG in April 2005, we hâve one

journal paper giving performances of index brokering on realistic workloads (Parallel

Computing, Elsevier) [17], one conférence paper extending our first model to batch jobs

(NetCoop07, Avignon) [18], as well as an INRIA research report [16].

(19)

1.2 Définitions and Grid Modeling 17

1.2 Définitions and Grid Modeling

We will now formalize several concepts and give définitions about Grids.

In this work, we will consider a quite simple but pretty much realistic model of com- putational grid. Our analysis is mainly focused on computational grids, as we assume that network latencies and transfer times are negligible compared to computation Hmes, and we thus do not take into accoimt the data localization.

The model we now présent could seem in some way quite far from reality. However, we believe that models being an exact représentation of the reality are, at least in the case of Grids, mathematically untractable. Analysis, prédictions, or strategies cannot be provided in such models due to their complexity. This is why we prefer to consider a simpler model, even if we could loose some realism, but on which some theoretical analysis is possible. Nevertheless, there are plenty of examples in the literature where theoretical models allow to improve the performances of real Systems, or to predict their behavior, even when the model simplifies the real environment. Indeed, in this work we will give such an example, where a simple model allows to find a very efficient strategy for a realistic model.

The aim of most computational Systems is to run jobs. It is true for Grids as well. As the literature gives a lot of définitions around this term (job, process, task, subprocess, program, threads, applications, operation... ), we first need to clarify the concepts which will be the center of our concem in this work.

Définition 1.1 (Process)

A Process is a sequence of computing operations running on a single processor.

We do not consider in this work the subdivision of a process into several threads, as it is sometimes done.

Définition 1.2 (Job)

A Job is a set of one or several process(es).

A job is said to be uniprocess (or sequential) when it is composed of one process, and multiprocess (or parallel) when it is composed of several (including one) prcKesses.

As a computational grid is generally described as a set of Computing Eléments, we need to define such a component.

Définition 1.3 (Computing Elément, Cluster)

A Computing Elément (also called a Cluster) is a set of CPUs or servers, and a single queue, both managed by a scheduler using a spécifie scheduling policy.

A Computing Elément gets jobs, and run processes on its CPUs.

(20)

In this work, we will only consider homogeneons Computing Eléments, composed of identical CPUs.

Définition 1.4 (Client)

A Client is a grid user who sends jobs to a grid, in order to run them on some CPU, and to get back results.

One may consider that a grid bas only one client, as a client can emulate the work of several clients.

While CPUs need to be managed in a Computing Elément (CE), CEs need to be man- aged in a grid. This is done by a kind of orchestra conductor, usually named the Resource Broker.

Définition 1.5 (Resource Broker, Router, Meta-scheduler)

A Resource Broker (also called a Router or a Meta-scheduler) is a grid component receiving jobs from Clients and sending each of them to a chosen Computing Elé

ment.

Définition 1.6 (Brokering, Routing, Meta-scheduling)

The Brokering is, in a computational grid, the action of choosing the Computing El

ément to which an incoming job is sent. The Brokering policy is the way of choosing such an action.

We hâve now every element we need for providing a first définition of a Grid.

Définition 1.7 (Grid)

A Grid is a set of Computing Eléments, hnked together through a Resource Broker, re

ceiving jobs from one or several clients.

Définition 1.8 (Computational Grid)

A Computational Grid, in contrast to a Data Grid, is a Grid for which the usage is computation oriented. Data transfers, data duplication, cache coherency manage

ment, large databases access, etc. are considered to be negligible with respect to the time spent in actual computations.

In the following of this work, we will only focus on Computational Grids.

The next two sections will consist in formai définitions about concepts defined more verbosely here above. In the next section (1.2.1), we will présent a model inspired from the queuing theory commimity [51, 38, 62). Then, we will adapt this model in Sec

tion 1.2.2, following a point of view coming from the scheduling community [56, 30).

(21)

1.2 Définitions and Grid Modeling 19

We will finish by comparing those two models, showing that from some aspects, they are not that different.

1.2.1 Queuing Model

We can now give more formai définitions about several concepts evoked in the previous section. Figure 1.1 shows a general OverView of the model of grid structure we consider here. The left figure gives a rather high level point of view, whUe the right one goes deeper in to the queuing model.

Figure 1.1: Two models of a Computational Grid with Resource Broker. The Grid is composed of M Computing Eléments (or queue), the queue being composed of Sj CPUs (or servers) of speed (or rate) pi. The System input, with rate A, is routed amongst the Af queues.

Définition 1.9 (CPU, Server)

A CPU or a server is a component able to perform computing operations, at a speed given by its service rate p, with 0 < p < oo.

Définition 1.10 (Process)

A Process p is a set of computing operations which, when performed by a CPU of rate p, will use it continuously during, in average, — imits of time.

P

As it is classically done in queuing theory, processes do not really hâve a pre-defined

execution time-, the execution Hme is fully determined by the server on which the process

runs. For instance, if a server provides a service time following a random variable ex-

ponentially distributed, one may consider that at each infinitésimal period of time, the

server décidés if the process continues or stops its execution, following an exponential

distribuHon.

(22)

From now on, servers are assnmed to run processes without preanption (a process can- not be interrupted in order to run another process on the same server), and without mi

gration (once a process bas started on a server, it stays on this server till the end of its execution).

As we shall study stochastic workloads, the execution time is obtained from a random variable, by rolling a (continuons) die. As soon as we use the same random variable, and if the prcKess lengths are not taken into account for any scheduling decision, it is équivalent to assume the dice to be rolled by the server when the job starts, or by the client at the submission time. In the first model, inspired from the queuing theory community, we assume the execution time is chosen by the server starting the process, or even during its execution. In the second model, execution time is chosen by the client submitting the job.

From the last définition, a process does then not hâve a true (absolute) execution time.

Its effective execution time is determined by the server (or CPU) on which this process runs. If a server has a rate of p, this means that the average (effective) execution time of processes running on this server is p~^. For instance, in the case of exponentially distributed execution time, the rate of the distribution will be p.

Définition 1.11 (Job)

A Job is a set of one or several processes, arriving together at a given submission time, and having to be sent to the same CE. The number of processes composing a job is named its width.

Définition 1.12 (Sequential Job, Uniprocess Job)

A Sequential Job (or Uniprocess Job) is a job composed of only one single process (job width=l).

In the case of sequenhal jobs, we will not differentiate jobs and processes. We wUl consider jobs as having their own execution time, because a job will hâve the execution time of its unique process.

In the case of parallel jobs, the execution time of a job can be also defined, but needs some convention. It could be for instance the sum or the maximum of its processes execu

tion time, or the total time during which at least one of its processes was running. Notice

that in this last case, the job execution time dépends upon the scheduling, which is not

always convenient. In this work, we do not consider execution time of jobs.

(23)

1.2 Définitions and Grid Modeling 21

I---1 I---1 I--- 1 I--- 1

Figure 1.2: Execution time of parallel jobs. Left: Asynchronous, Center: Semi-Synchronous, Right:

Fully-Synchronous.

Définition 1.13 (Parallel Job, Multiprocess Job)

A Parallel Job (or Multiprocess Job) is a job with one or several independent pro

cesses.

By independent processes, we mean that there is neither precedence dependencies nor common resources (and then no communication) between processes.

We do not enforce here parallel jobs to bave more than one process; sequential job is then a particular case of parallel job.

Définition 1.14 (Asynchronous Parallel Job)

An Asynchronous Parallel Job is a parallel job for which, once in a cluster queue, every process is independent from other processes.

Définition 1.15 (Semi-Synchronous Parallel Job)

A Semi-Synchronous Parallel Job is a parallel job with the constraint that its pro

cesses hâve to start their execution simultaneously, but are independent afterwards.

Définition 1.16 (Fully-Synchronous Parallel Job)

A Fully-Synchronous Parallel Job is a parallel job with the constraint that its pro

cesses hâve to start their execution simultaneously, and releasing the CPUs only when ail its processes are completed.

Définition 1.17 (Synchronous Parallel Job)

A Synchronous Parallel Job is parallel job which is either semi-synchronous or fully- synchronous.

In this work, we assume that jobs are independent, i.e., they do not share common

resources except CPUs, and there are no precedence constraints between jobs, as we as-

sumed for processes.

(24)

Définition 1.18 (Job Flow)

A Job Flow T^xyVK i® infinité set of jobs, for which:

• The inter-arrival delay follows the probability density function^A\, with 0 <

A < oo and E[-A

a

] = A“^;

• The job width is an integer between 1 and K distributed according to the law W

k

.

Notice that another very similar model could hâve been considered: we can hâve K arrivai processes, the process k following a law Ax,,, and submitting only jobs of width k. Those K arrivai processes are merged before entering the System. In order to hâve our both models comparable, Xk need to be chosen consequently: if Wk = P[W

a

: = k\, we should hâve A*; = Xwk- In the case of Poissonian arrivais, where merging K processes of arrivai rate A/t is équivalent to a Poissonian process of arrivai rate ^ A/t, and as a given

k

arrivai in the second model has a probability Wk to correspond to a job of width k, both models are équivalent.

Définition 1.19 (Computing Elément) A Computing Elément Cj is

• A set of Si homogeneous servers (or CPUs) having a service rate ni;

• With a queue (or buffer) having a capacity Bi — Si, where jobs are stored while there are not enough free servers to run them. The capacity of Ci will be Bi;

• Running jobs following a scheduling policy Sj.

A Computing Elément Cj is then identified by the tuple {sj, pi, fî;, S;}.

The capacity of a computing element (CE) means that if this CE contains already as many jobs as its capacity, any incoming job sent to this CE will be rejected. Of course, as much as possible, a resource broker (RB) should avoid to send a job to a CE having already reached its capacity, but, as we will see later, a RB could be not informed of such situation.

In the following, the local scheduling policy S will be by default FCFS (First Corne First Served, see [49] for instance), also known as FIFO (First In First Out), and will be the same for every Computing Element. The principle of FCFS scheduling policy is to only

'The probability density fonction / : R —* R'*' of a random variable expresses its density of probability; the area under the curve / between two abscissas a and b is the probability that a drawing of the random variable will be between those two numbers.

(25)

1.2 Définitions and Grid Modeling 23

consider the job at the front of the queue. If there are enough available servers for this job to be executed, the job starts. If not, the CE waits until enough servers are freed (other jobs are not considered meanwhile).

ji CPU 4

\2, CPU 3

UL

CPU 2 CPU 1

■... 4...

G

... 0...

1 -T-

î îî ^t

arrivai times: 1 2 3 45 6 7 CPU 4

U.

CPU 3 CPU 2 CPU 1

___.O. . .

... 4... 7 1 — 3 ■

...

6

...

î nin- ^‘ ^t

arrivai tinnes: 1 2 3 45 67

Figure 1.3: Example of scheduling, with seven jobs arriving successively, having 3, 2, 2, 2, 4, 2 and 2 processes. Top: FCFS scheduling, Bottom: FF scheduling.

Here are some définitions allowing to characterize a scheduling policy.

Définition 1.20 (Eligible)

A synchronous parallel job of width n is said to be Eligible if at least n CPUs are free in the same CE.

A process belonging to a sequenhal or an asynchronous parallel job is said to be Eligible if at least one CPU is free.

Définition 1.21 (Scheduling Policy)

A Scheduling Policy (or a Scheduling Strategy) is a function which gives, for any cluster configuration and if the queue is not empty, the next job (in the case of parallel synchronous jobs) or process (in the case of sequential or parallel asynchronous jobs) to start, eligible or not. The chosen job/process is named the highest priority job or the highest priority process.

Here, the cluster configuration contains any information about the cluster available

to the scheduler . This can be simply the number of tunning and waiting jobs, or, in more

complex Systems, the arrivai, the start and/or the (expected) end time of any job.

(26)

Notice that this définition requires the scheduler to be non ambiguous: for any cluster configuration, there is exactly one highest priority job/process (except if the queue is empty). But this does not prevent the scheduler to start several jobs/processes at the same time, because once a job is launched, the cluster configuration changes, and another job can get the highest priority at the same time.

Moreover, the job/process retumed by the scheduling policy is not necessarily the next one to effectively start: it could happen that, at some time, the highest priority job/process is not eligible, and that the configuration changes in such a way that the pol

icy gives the highest priority to another job/process, before having started the previous highest priority job/process.

Définition 1.22 (Greedy)

A scheduling policy is said to be Greedy (also called expédient) if if never leaves any resource idle intentionally. If a System runs a greedy policy, a resource is idle only if there is no eligible job waiting for that resource.

Notice that, in the parallel case, FCFS or BackfiUing [61] are not greedy, while FF (First Fit, see below) is. In the sequential case, FCFS, which corresponds to FF, is greedy.

Définition 1.23 (ASAP)

An ASAP scheduler, standing for as soon as possible, is a scheduler which starts any job/process at the first time this job/process becomes the highest priority job/process and is eligible.

Remark that, in the case of non preemptive Systems, non ASAP schedulers could be more efficient than ASAP ones. For instance, a strategy could choose to delay the schedule of a job, hoping the near arrivai of a "best" job. However, in the foUowing of this work, we only consider ASAP scheduler.

In the case of synchronous parallel jobs, another scheduling policy will be considered in this document: FF, standing for First Fit [40]. With this policy, the first eligible job starts when possible. FF and FCFS are of course idenhcal in the case of sequential jobs or asynchronous parallel jobs. A strategy based on FF and using advance réservation is used in OAR, the batch scheduler of Grid'5000 (see [6] and Section 1.4.3). An example of such scheduling policies is given in Figure 1.3 for fuUy S)mchronous parallel jobs.

It is because we look at ASAP Systems (see Définition 1.23) that a s)mchronous System

is not a particular case of asynchronous System. Indeed, if we do not impose the System

to be ASAP, an asynchronous policy can choose to not start processes of a job as soon

as possible. In particular, the policy could choose to start every process of the same job

simultaneously.

(27)

1.2 Définitions and Grid Modeling 25

Very often, today cluster schedulers do not use FF or FCFS, but more sophisticate techniques, using préemption (a process can be interrupted and resumed later), migration (a process can be interrupted on a server, and resumed on another server, either on the same cluster — intra-cluster migration — or on another cluster—inter-cluster migration), dynamic priority Systems, etc. See for instance [49] for more details. In this thesis, we do not focus on cluster performances, since our aim is to compare varions brokering strategies. It is reasonable to assume that the comparison between strategies will not be too much impacted by a better low level scheduling strategy, because every cluster would improve its performance in a similar ratio. This assumption can of course only be done if strategies are not drastically different from FCFS, and show only marginal divergences.

For instance, comparing brokering strategies if the local scheduling policy is Last Corne First Served, wiU probably not give the same conclusion as if the local scheduling policy is FCFS. This is why we only consider simple scheduling strategies, and non-preemptive Systems.

Based on the last définition, we can now give a formai définition about the mathemat- ical object we name a Grid System, formalizing the définition of a Computational Grid.

Définition 1.24 (Grid System)

A Grid System Ç isatuple where

• {Ct}i6[i,...jvl is a set ot Af Computing Eléments Ci = {«,, Bi, Sj};

• -^.4

x

,

w

'

k

is the arrivai job flow;

• /? is the Brokering policy.

A formai définition of brokering will be given further in this document.

Figure 1.1 (page 19) gives tivo visions of a computational grid. On the left hand side, the Resource Broker (RB) and the Computing Eléments (Ci) are seen as black boxes.

Clients send jobs to RB, which dispatches them to some Ci according to its routing policy.

The right figure gives a more "queuing theory" oriented approach. A stream of jobs hav- ing a rate A cornes into the System. This stream is split in such a way that each job is sent to one Computing Elément, which can be seen as a buffer (waiting queue) associated to some servers (CPUs). The s, servers of the queue Ci hâve a service rate of pu.

1.2.2 Scheduling Model

Erom a scheduling point of view, each incoming job j is composed of one or several

processes p each having a Virtual execution time ip (or Virtual execution length), chosen by

(28)

the client before submitting the job, possibly from a probability distribution. This Virtual execution time means that, on a processor of relative speed p = 1, the process p would take ip units of time. Then, on a CPU with any relative speed p, a process p will hâve an effective execution time of ip/p. Or, if a process runs during £ units of time on a pi CPU, it will run £■ or\ a p

2

CPU.

If the broker choices are made without taking into account job and/or process lengths, choosing the process length at submission time (scheduling model) or when the job starts (queuing model) is équivalent. And if the local scheduling decision (choosing the CPU) is also made without information about lengths, the process length can even be chosen when it starts.

These points of view are then not incompatible; from a "macroscopie" vision, the scheduling model with an average Virtual execution time of 1 with, let say, a distribution D with E[0] = 1, wiU behave the same way as the queuing model with a rate p, if the execution distribution is D' such as foim) = e R+, where fo and fo' are the probability density functions of D and D'. Notice that compelling the average Virtual execution time to be equal to 1 is not restrictive: this only constrains to choose the time unit in order to reach a unitary average. If the Virtual execution times are scaled, relative speeds are scaled accordingly.

We assume in our model that the execution time (or its average) does not dépend of environmental factors, such as data transfer or communicaHons cost, nor local factors such as boot time, migrations, préemption or other scheduling costs.

As for the service rate p, one can give two interprétations of the arrivai rate A.

From a queuing theory point of view, we hâve the concept of job stream, and A can be seen as the inverse of the inter-arrival average delay. For instance, if arrivais follow a Poisson process, A is the rate of this process.

From a scheduling point of view, we hâve an infinité set of job ids J {J = N* = {1,2,... }Ÿ, and for each j € J, aj is the submission (or arrivai) time of the job j. The arrivai rate A is considered as the inverse of the average inter-arrival delay. In other words, {aj\j G J} is such that

®i-i) hm -7T^---- -Tl--- r-

\\3^J\aj<t\\ = A-i with ao = 0.

As A < oo (and then A“^ > 0), we can without loss of generality assume^ that J is sorted by the submission time, meaning that Vf < j, a; < aj. This schéma could help to

^The Symbol = means “is by définition" or "is defined as".

^If

A

can be null, this is not aiways possible to re-order jobs. For instance, we could hâve infinitely many Jobs arriving at some time

t,

and one job arriving after

t.

There is no possible numbering allowing to sort

(29)

1.2 Définitions and Grid Modeling 27

visualize our System:

0 Ûi (22 ••• Oj — i dj ... ^

1 I I_______________ i_____ Il___________________________ I ’ I

We then sum up every interval between 0 and (the last arrivai before t), and divide this value by the number of intervals (fc(f)). Of course, the sum equals

This équation can be simplified, because every element but the first and the last one are simplified. Then,

lim ^ = A-

t-*oo

k(t) (1.1)

where k{t) = max{j € J \ aj < t}. We need that the set {j e J \ aj < t} is finite, or at least that its maximum exists for any t. But we hâve A < oo (or A“^ > 0), which is a sufficient condition for that.

We introduce here a new notation.

Définition 1.25 (Asymptotic behavior)

A function f\{t) behaves asymptotically like f

2

{t) (denoted fi{t) ~ f

2

{t)) iff

Equivalently, we hâve

t-c» f2{t) lim^

⁼

1 .

h{t) = flit) + £(t),

with lim = 0.

i—oo f2(t)

According to this définition, we can hâve some results about the asymptotic behavior of the inter-arrival delay.

Lemma 1.1

^max{j€^jaj<t} ^k(t) P roof

We hâve to show that au(,\ ~ t, or that lim —^ = 1. By définition of k(t), we hâve

t (-.oo aj.(t)

dk(t) <t< <ifc(t)+i.

this scénario by submission time. Or we coul(i hâve ai = 2, and a, = 1 — jVz > 2, which does not allow a re-ordering either.

(30)

Then,

t ^ Qfc(t)+i _ Q/t(t)+i fc(t) + 1

^k(t) ~ ^k(t) **^'(*) + 1 “i(t)**

Taking the limit when t —> oo, we hâve (with Equation (1.1)) 1 < lim —-— < • ( A + lim ---| = 1.

t^oo afc(() V «fc(t) /

We hâve then lim ---= 1.

<^oo a/t(() □

Now we hâve:

Theorem 1.2

max{j e J \ üj < t} t

P roof

We hâve lim

<—oo

k(t) Then,

= A ^ (from Equation (1 1)), and hm = 1 (from Lemma 1.1)

t—OC* lim and

lim t k{t) = A

^-1

□

This theorem will be useful later.

1.2.3 System Load

When studying computational Systems, it is generally required to be able to measure the load of the System, giving an évaluation about the amoimt of work the System is Processing. Again, we will give two approaches of this measurement, the first one from the scheduling world, the second one from the queuing theory one.

First, we need to define the concept of computational capacity:

(31)

1.2 Définitions and Grid Modeling 29

Définition 1.26 (Computational capacity)

The Computational capacity of Cj is defined as The Computational capacity of A'

a grid Q is defined as fMSi, and is denoted M.

_ i=l

The computational capacity can be seen as the number of Virtual CPUs, or the number of homogeneous CPUs of rate /r = 1 équivalent to the original System, in a perfect world where one can perfectly take advantage of a larger number of CPUs.

From the queuing theory point of view, it can also be seen as the total rate, or the rate that a unique server would need to reach to be virtually équivalent to the whole System.

Indeed, the total service rate of a set of server is the sum of the service rate of each server, which is denoted by M. This number M is of course not necessarily a natural number.

From the scheduling point of view, we define now the System load !/(f i, t

2

) as the total amount of Virtual work received in [U, f

²

[, divided by the product of the total number of Virtual CPUs (M) and the total duration (t

²

— t{). Or,

h) —

Z]

Obviously, we hâve that if u{ti,t

2

) > 1, some jobs received in carmot be com- pleted. The System is then not schedulable on [t\,t

2

[. Here, not schedulable on [ti,t2[

means that there is no brokering and/or scheduling decision allowing to finish before t

2

ail jobs received in [ti, f2[. In other words, if the arrivai pattern on [fi, f2[ is indefinitely repeated with a period t

2

— h, at least one queue will increase indefinitely, or, if queue sizes are boimded, an unboimded number of jobs will be rejected.

Of course, being not schedulable on [ti,t2[ does not mean that jobs still waiting or running at time <2 will not be completed after t

2

- The System could for instance become schedulable if we extend the range.

The condition v{t\,t

2

) < 1 is a necessary condition of schedulability, but usually not sufficient. For instance, if a job arrives at time t' < t

2

, but with a running time on the faster CE greater than t

²

— t', it cannot complété before t

2

- By définition, if at least one job arrives in [0, oo[, it is always possible to find a non schedulable interval: ti just before a job arrivai, and (2 just after this arrivai.

However, in general, we are interested in long intervals, such as lim [ti, t

2

[.

ti~0* t2—00*

Now, from the queuing point of view, we would like to hâve a similar définition. The

load is then classically defined as the arrivai rate divided by the total service rate, or:

(32)

It is easy to see that those two définitions are asymptotically équivalent, or that i^{i) ~ V, where u(t) stands for v{Q, t). Indeed, as mentioned above, to go from the first

* 1 "

model to the second one, we fix f j = 1 in average (or lim - J' ij = 1), and keep the

■' ° n^oo n jTi

same fj., and A. We then hâve

u(t)

E {j€j\aj<t}

Il Q- M-t € J\aj < t} Il

M-t

max{j Ç. J \ üj < i) 1 M t

t M

^ V.

(seeTheorem. (1.2))

(

1

.

2

)

The necessary condition of schedulability becomes now < 1.

1.2.4 System State

In order to broker jobs, it is often required to hâve a knowledge about the current State of the System. A first System State model — applicable only for sequential or parallel asynchronous jobs — would consist to know the number of processes being in each CE, running or waiting. Each CE can then be characterized by an integer. We will use in this case the followmg notations: •

• Xi is the Ci State, or the number of processes currently présent in the queue (waiting and running). We hâve Xi G {0,..., jBi}.

• a; = (xi,..., a;;v) is the System State.

In some case, it will be useful to enrich CompuHng Eléments State. We will corne back on that point later on (Cf. Chapter 4).

Generally, this information does not fully characterize the State of the System, because knowing the System State does not allow to predict the future. The System State could for instance include the elapsed time from the start times of jobs (or processes) currently run

ning, or even their end time. However, it is often not realistic to hâve such information:

knowing the end time of jobs is generally very difficult, or impossible. And in most case, any information about start times will not give more information about the end time.

In the case of parallel jobs, it could be required to know the width of each job waiting

in the queue.

(33)

1.2 Définitions and Grid Modeling 31

We dénoté by <S the State space of the System. In the first simple model we presented here above, we bave S = {0,..., Bi} x ■ ■ ■ x {0,...,

Notice that if, as it will often be the case in this work, we focus on Systems having the memorylessness property, such as Poissonian arrivais with exponential services, the System State we define here fully characterizes the System; knowing the time processes hâve already spent in the System does not give more information.

As we assume our System to be AS AP (see Définition 1.23), and as we only consider independent processes by now, we know that if x, < s,, there are no jobs pending in the queue. If Xj > s;, then s; processes are nmning, and Xi — s, processes are waiting.

In Chapter 4, this model will be refined, for taking parallel jobs into account. In this case, there exist situations where there are idle CPUs, and some jobs are waihng in the queue.

1.2.5 Underlying Markov Chain

In several cases we are going to analyze in this document, we will focus on State transi

tions. If the brokering is only state-dependent, we can consider the underlying Markov Chain.

This transition Markov chain gives the probability to go from a State to another one, knowing that a transition has occurred. The probability that a transition is going from State X = (xi,..., x^f) toy = (yi,..., yjij-) is denoted

T{{xi,... ,xjy),{yi,... ,yj<j-)).

For instance, in the case of sequential jobs, with no simultaneous departures or ar

rivais, T will be defined as follows. Let T>i(x) be the probability that an event (or transi

tion) is a departure from C; when the State is x, A(x) is the probability that an event is an arrivai when the State is x, and Bj(x) the probability that an incoming job is send C, by the broker when the State is x. T is then defined as:

T(x,x-lj) =2?i(x) Vf e {1,...,A'}

r(x,x-Hj) =A{x)-Bi{x) Vf e {1,...,A'}

T{x, y) =0 Otherwise.

where 1; is the vector composed only of "O's", except a "1" at position f. The sum Bi{x)

i

represent the probability to hâve a departure in the System when the State is x, or more

exactly the probability that the next event is a departure. As A{x) is the probability that

the next event is an arrivai, and as we assume now that there is no other events than

(34)

arrivais and departures, we hâve A{x) + J2T>i(x) = 1 Va; € S. Similarly, we hâve that i

Bi{x) = 1, if we assume that there is no possibility to reject a job.

(35)

1.3 Brokering 33

1.3 Brokering

As mentioned above, the brokering consists in choosing a CE to which to send a job Corning to the RB. This is then simply a sélection problem: the RB has to choose one CE amongst N, and forward the job toward the chosen CE. A job then never waits into the RB, except the time required for making up its decision — assnmed to be negligible in our model —, but will eventually be queued in the elected CE.

A job will transit between several States. In our model, some of them are assumed to be instantaneous. First, once the client has created a job, the job is said submitted. Then the RB gets the job, and chooses a CE for sending it. The job then becomes ready, and tums into queued once actually sent to the CE. The job now waits for one (or several) server(s) to be free, and is said running as soon as it starts its execution. When the job has completed, it becomes done.

In our model, we assume that the States queued and running are the only one to hâve a (potenHally) non null duration. Other State durations are considered as negligible in comparison to those ones. Notice that, if there are enough free servers when the job is queued, it starts directly, and the State queued has then a null duration.

Notice that, as we assume our System to be non preemptive, a job remains running until it finishes, and does not go back to queued State, or any other waiting State.

Brokering policies can be categorized into several familles. Mainly, a brokering policy can be:

• Open-loop or closed-loop-,

• Memoryless or using historical information;

• Deterministic or probabilistic.

Ail of these characterizations pairs are exclusive: a brokering policy is either open- loop or closed-loop, but never both, or either deterministic or probabilistic, but not both.

1.3.1 Open-loop and Closed-loop

In the first family (open-loop, also known as static), the brokering is done without taking into account d)mamic information, such as the current State, or information about run

ning jobs. This kind of policy is only based on static information, such as the number of

CEs (jV), the number of CPUs (sj), and/or the speed (pi). This kind of model does not

apply to most today grids, because modem RBs hâve access to feedback from CEs. This

kind of strategies can however be used as a comparison point in the aim of showing the

advantage of information feedback. Chapter 2 will be devoted to that kind of Systems.

(36)

The second family (closed-loop, also known as dynamic) allows to use static and dy- namic information about the System. Typically, a closed-loop broker will use information such as the current queue size, or the number of busy CPUs.

Closed-loop strategies can be sub-categorized, according to which informaHon is available:

• The current State of each CE (x,);

• The width of the job entering the System;

• The number of free CPUs (in case of synchronous jobs);

• The arrivai time of jobs;

• The (estimated) end time of running processes;

• Some historical information about any of these values;

• Etc.

In this work, wfe will only consider strategies using the first three information: the current State, the width of the entering job (in the case of parallel jobs), and the number of free CPUs (in the case of synchronous parallel jobs).

1.3.2 Memoryless and Historical Information

In many cases, the broker dœs only hâve information about the current State of the grid, and about static information. The brokering does not dépend on information about the past of the System, such as the arrivai time of running (or finished) jobs, the State of the System at some time in the past, or previous broker decisions.

A variation on the Round-Robin strategy as used in [41] could for instance be an open-loop strategy (the queue State is not considered) using historical information (past decision from the resource broker).

1.3.3 Deterministic and Probabilistic

Starting from a given State s, a job coming in a deterministic broker would hâve only One possible destination d. There is then a univoque mapping between the State and the destination. Here, the définition of State would dépend on the case: open- or closed-loop, memoryless or not.

With a probabilistic broker, several destinations could be possible, each of them asso-

ciated with a probability.

(37)

1,3 Brokering 35

Possibly, a broker could be deterministic in some States, and probabilistic in others.

This is of course a particular case of a probabilistic broker.

Several kinds of broker can be combined, but one of the 8 combinations does not make any sense: a broker cannot be open-loop, memoryless and deterministic. Otherwise, any job would be sent to the same CE.

Let us remark that we may consider deterministic strategies as a generalization of (pseudo-)probabUistic ones, or at least that they can emulate them: we need in such a case to add in the state x the State of a pseudo-random generator.

1.3.4 Mathematical Model

This Work will only consider memoryless brokers. Chapter 2 will be devoted to prob

abilistic open-loop brokers. Chapter 3 will address deterministic closed-loop Systems using only the current State as feedback information. In Chapter 4 we will extend this model and use the job width and/or the number of free CPUs.

Formally, we can then consider a brokering policy as a function giving an integer between 1 and A/ as a resuit. As it could happen the whole System to be full, we need the RB to be able to reject a job; we associate this reject to the value 0 for this function.

In the case of a closed-loop broker, this function has the State and/or some other in

formation as parameter. A brokering policy function for (memoryless) open-loop brokers does not take any parameter.

Deterministic brokers correspond to usual fonctions (giving one resuit for a given parameter). Probabilistic brokers dépend on random variable and can, for the same State, give different outputs.

Probabilistic Memoryless Open-loop Brokering

We can now give more formai définitions about several kinds of brokering. The function Bi{x) is defined in Section 1.2.5.

Définition 1.27 (Probabilistic Memoryless Open-loop brokering)

A Probabilistic Memoryless Open-loop Brokering Policy is a set of J\f values /3j M

such that 0 < Pi <1 and pi = 1, and where transitions in the underlying Markov

!=1 Chain are defined with

Bi(x) = Pi.

Notice that, as there is no feedback information, the brokering function does not know

if a queue is full or not, and is then unable to reject jobs if the elected CE is full, hence

Disponible à / Available at permalink :

ULB

FACULTÉ DES SCIENCES

Stochastic Approach to Brokering Heuristics for Computational Grids

Thèse présentée en vue de l'obtention du grade de Docteur en Sciences

Directeur de thèse ; Joël G

Université Libre de Bruxelles

r

Vandy BERTEN

Année académique 2006-2007

Thèse soutenue publiquement

à Bruxelles, le 8 juin 2007.

3

Remerciements

Un travail d'une telle ampleur n'est jamais une réalisation personnelle. Même si une seule personne écrit son nom sur la couverture, il est très loin d'en être le seul auteur. Je ne pouvais donc pas commencer cet ouvrage sans remercier ceux qui, autant que moi- même, en sont à la base.

Parmi tous les participants à ces quatre années de labeur, il en est certainement un qui y a contribué plus que tout autre ; il s'agit bien sûr de Joël

Quelques mois après le début de ma thèse, j'ai eu la chance de rencontrer l'équipe d'AlGorille,

Nancy, et plus particulièrement Emmanuel

qui m'a accueilli

plusieurs reprises au cours de ces dernières années. Nos discussions diverses et variées m'ont permis d'étendre mes centres d'intérêts, et je ne peux que lui en être reconnaissant.

Par la suite, j'ai eu le plaisir de collaborer avec quelques chercheurs de l'IMAG de Grenoble, et plus spécialement avec Bruno

qui je dois les idées

et Jérôme

pour leur outil de simulation parfaite, qu'ils ont

de nombreuses reprises adapté pour les besoins de nos expériences. C'est par aiUeurs grâce

Bruno que j'ai pu accéder

Grid'5000, et y consommer près de 40000 heures de processeur.

Toute ma gratitude va également à Raymond D

, pour ses nombreuses relec­

tures méticuleuses de mes écrits, et toutes les discussions passionnantes qui ont suivi.

Comme peuvent en témoigner de nombreux membres du département, il a de toute

évidence fortement contribué à la rigueur scientifique, à la précision, à la prise en compte des cas limites. Il y a la thèse avant DEVILLERS, et après D

...

Je me dois certainement de remercier tous ceux qui, outre MM. GOOSSENS, JEAN- NOT, G

et D

, ont accepté de faire partie du jury: Olivier M

et Guy L

de l'ULB, et Pierre M

des FPMs. Je remercie plus partic­

ulièrement M. L

, à qui je dois en grande partie la démonstration de l'annexe B de ce travail.

Je n'oublie bien sûr par non plus ceux qui m'ont entouré ces dernières années;

bémelois, colonnards, cousins, taizéens, membres du DI, et bien d'autres encore !

CONTENTS 5

Contents

1 Introduction to Grid Brokering 13

1.1 Motivations and Context... 14

1.1.1 Outline... 16

1.2 Définitions and Grid Modeling ... 17

1.2.1 Queuing Model... 19

1.2.2 Scheduling Model... 25

1.2.3 System Load... 28

1.2.4 System State ... 30

1.2.5 Underlying Markov Chain... 31

1.3 Brokering ... 33

1.3.1 Open-loop and Closed-loop... 33

1.3.2 Memoryless and Historical Information... 34

1.3.3 Deterministic and Probabilistic... 34

1.3.4 Mathematical Model ... 35

Probabilistic Memoryless Open-loop Brokering... 35

Deterministic Memoryless Closed-loop Brokering... 36

1.3.5 Bernoulli Brokerings... 36

1.3.6 LCB Policies... 38

1.4 Examples... 41

1.4.1 EGEE... 41

1.4.2 NorduGrid... 43

14.3 Grid'5000 ... 43

1.4.4 GridBus... 44

1.5 Cost Eunction... 45

1.6 Concepts... 48

1.6.1 Resolving Markov Decision Processes Using Dynamic Programming . . 48

Minimizing the Policy... 50

Value Itération... 50

Policy Itération ... 51

1.6.2 Perfect Simulation... 51

2 Random Brokering 53 2.1 Introduction and Model... 54

2.1.1 Dispatching the Jobs... 55

2.1.2 Numerical Simulations ... 56

2.2 Sequential Systems... 58

2.2.1 System Load... 58

, pour ses nombreuses relec

des FPMs. Je remercie plus partic