A programming model and execution environment for autonomous systems

(1)

Thesis

Reference

A programming model and execution environment for autonomous systems

RAZAFIMAHEFA, Chrislain

Abstract

This thesis presents the design and implementation of a programming model for autonomous systems. Autonomous systems are distributed systems based on wireless networks, mobile devices and the Internet. They are characterized by the high dynamics with which their configuration evolves. Ad hoc networks, a member of autonomous systems, illustrate this point since in these networks participants can join and leave at any time. Similarly in Peer-to-Peer networks, another member of autonomous systems, users can abruptly decide to no longer share their resources and to join only when needed. Besides voluntary disconnections decided by users, autonomous systems also suffer from disconnections caused by the infrastructure, e.g. network failures, latency, node failures, etc... Disconnections are therefore a key issue in autonomous systems. Programming distributed systems has always proven to be difficult, but programming distributed systems where disconnections play a major role is even more difficult.

RAZAFIMAHEFA, Chrislain. A programming model and execution environment for autonomous systems. Thèse de doctorat : Univ. Genève, 2004, no. SES 575

URN : urn:nbn:ch:unige-175725

DOI : 10.13097/archive-ouverte/unige:17572

Available at:

http://archive-ouverte.unige.ch/unige:17572

Disclaimer: layout of this document may differ from the published version.

(2)

A Programming Model and Execution Environment for Autonomous Systems

Thèse présentée à la Faculté des sciences économiques et sociales de l’Université de Genève

par

Chrislain RAZAFIMAHEFA

pour l’obtention du grade de

Docteur ès sciences économiques et sociales, mention systèmes d’information

Membres du jury de th`ese:

M. Ciaran BRYCE, Maˆıtre d’enseignement et de recherche M. Didier BUCHS, Professeur, Facult´e des sciences Mme Laurie HENDREN, Professeur, McGill University M. Dimitri KONSTANTAS, Professeur, directeur de th`ese

M. Michel LEONARD, Professeur, pr´esident du jury

Th`ese no 575 Gen`eve, 2004

(3)

La Faculté des sciences économiques et sociales, sur préavis du jury, a au- torisé l’impression de la présente thèse, sans entendre, par là, émettre aucune opinion sur les propositions qui s’y trouvent énoncées et qui n’engagent que la responsabilité de leur auteur.

Gen`eve, le 30 juillet 2004

Le doyen Pierre ALLAN

Impression d’apr`es le manuscrit de l’auteur

(4)

Remerciements

Avant tout, je tiens à remercier le Pr. Dimitri Konstantas pour avoir dirigé cette thèse. Durant toutes ces années, il m’a prodigué de nombreux conseils qui se sont toujours avérés judicieux et il m’a permis d’évoluer dans un cadre serein.

Ensuite, toute ma gratitude revient au Dr. Ciarán Bryce qui a été présent tout au long de cette thèse et sans qui tout aurait été beaucoup plus difficile.

Il m’a fait partager sa passion et son expérience, m’a guidé et m’a offert un précieux soutien. Pour tout ce qu’il m’a apporté, je tiens à le remercier très vivement.

J’aimerais aussi remercier le Pr. Denis Tsichritzis de m’avoir donné la chance de joindre le Groupe Systèmes Objet, groupe qu’il dirigeait alors, et de m’avoir permis d’y effectuer une thèse. Il nous a offert un cadre plus qu’adéquat pour mener nos travaux.

Je tiens aussi à remercier grandement tous les membres du jury d’avoir accepté leur tâche et d’avoir suggérer des améliorations au document. En premier lieu, je pense au Pr. Laurie Hendren qui a accepté le rôle de jurée externe. J’ai eu la chance de suivre les cours du Pr. Hendren et de travailler sous sa direction lorsque j’ai effectué mon master Montral. J’ai beaucoup appris en la côtoyant. Je remercie aussi le Pr. Didier Buchs qui m’a apporté son aide sur certains aspects théoriques de la thèse, ainsi que le Pr. Michel Léonard qui a accepté le rôle de président du jury.

Un grand merci revient aussi au Pr. Jan Vitek, qui a permis à mon collègue Michel Pawlak et à moi-même de passer quelques mois dans son groupe de recherche de l’université Purdue dans l’Indiana.

Je tiens aussi `a saluer chaleureusement tous les membres du Groupe

(5)

Systèmes Objet que j’ai pu côtoyer et avec lesquels j’ai toujours pu travailler dans une très bonne ambiance.

Un grand merci aussi à tous mes amis de longue date, notamment Clau- dine et Christine, deux personnes que j’apprécie profondément et qui m’ont toujours vivement encouragé et soutenu, Géraldine, Gergana, Muriel, Nathalie, Pierrine, Réjane, Antonio, Ciarán, Dimitri, David, Frédéric, Laurent et Pas- cal qui chacun à leur manière m’ont énormément apporté, et toutes celles et ceux qui ont partagé des moments avec moi à Dufour et qui j’en suis sûr se reconnaˆıtront.

Ma reconnaissance revient aussi au Fonds National Suisse de la recherche scientifique et `a l’Etat de Gen`eve pour leur soutien.

Finalement, je dois énormément à mes parents et à mon frère pour tout l’amour et l’inconditionnel soutien dont ils ont fait preuve durant mon par- cours. Pour cela, je tiens à les remercier profondément.

(6)

Dédiée à, ma famille

(7)

(8)

List of Figures

2.1 An Autonomous System: an ad-hoc network of PDAs. At any time, new members can join and some might leave. Decisions to join or quit are taken unilaterally by members. . . 12 2.2 The causes of a disconnection are numerous in autonomous sys-

tems: a member abruptly leaves the network, a device goes through a tunnel or a failure is present on the network infrastructure. . . . 20 2.3 This figure shows three interacting autonomous system devices.

Devices are able to run multiple autonomous system entities concurrently. Entities communicate by exchanging events and they can migrate from device to device. . . 24 2.4 In the Java model, references have strong reference semantics and

calls are synchronous. In autonomous systems, the model needs weak references and asynchronous calls because of uncertainty introduced by disconnections. . . 26 2.5 The Java 1.0 security model. All downloaded codes are isolated by

being put into the sandbox region. They do not have access to the machine resources. . . 30 2.6 The Java 1.1 security model. Downloaded codes are either com-

pletely trusted, in which case they have access to all resources, or they are not trusted at all, in which case access to resources is complete restrained. . . 31 2.7 The Java 1.2 security model. Access to resources by trusted code

can be specified in a more fine-grained manner than in the previous models. It is for instance possible to state that the disk is accessible but the network is not. . . 32

(13)

2.8 This figure shows a JVM process running three domains. It illustrates the idea that sandboxing, classloaders and Isolates are seeking: obtaining isolated computations where cross-domain ref-

erences are not allowed. . . 34

2.9 In the Actor model, one thread is in charge of updating the actor encapsulated state according to asynchronous message received. The same thread is responsible for sending messages to other actors as well as creating new actors. . . 38

2.10 Cells encapsulate state and import and export code at runtime. . . 41

2.11 This figure illustrates a hierarchy of Ambients where in an office, reside one laptop and one desktop running multiple applications. . 44

2.12 The Ambient inprimitive. . . 45

2.13 The Ambient out primitive. . . 46

2.14 The Ambient open primitive. . . 46

3.1 An environment with programs, spaces and a message board. . . . 52

3.2 A Lana program, with four local objects. . . 55

3.3 Programs and spaces. All references to objects from outside their space are weak. . . 57

3.4 A hierarchy of Programsand object Spaces. . . 58

3.5 The hierarchy of principal Lana classes. . . 63

3.6 An example of a Lana Program. . . 65

3.7 A server program creating an agent and putting it into the message board. . . 66

3.8 A client program retrieving an agent from the server message board. 67 3.9 The core Lana class API. . . 68

3.10 The user/kernel boundary. . . 77

3.11 The example of the previous section revisited. . . 78

3.12 A capability object interposed between object spaces. . . 80

(14)

3.13 The creation and interposition of capabilities between spaces. . . . 82 4.1 Distributing the virtual machine. The client VM does not possess

a compiler or a loader. It rather exploits compiled and loaded code images received from dedicated servers. . . 92 4.2 Off-line interaction between client VMs and server VM. . . 95 4.3 The different steps leading a JVM to the execution of an application. 96 4.4 Delegated interaction between client VM and server VM. . . 98 5.1 An overview of the Joeq system. . . 103 5.2 An object representing a class in memory as well as the other

objects it requires to implement its functionality. . . 109 5.3 Procedures used for producing off-line images of classes loaded by

the server VM (part 1). . . 110 5.4 Procedures used for producing off-line images of classes loaded by

the server VM (part 2). . . 111 5.5 Procedures used by the delegate off-line compiler to produce com-

piled code images (part 1). . . 116 5.6 Procedures used by the delegate off-line compiler to produce com-

piled code images (part 2). . . 117 5.7 Procedures used by the client to re-install compiled code images

into its address space. . . 118 5.8 This figure shows how images are relocated from server to client.

The right part of the figure shows the image from the left installed inside the address space of a running process with all relocations performed. . . 119 5.9 Gain due to removal of compiler and loader code for the VM image

(sizes are in MB). . . 121 5.10 Speed up due to remote loading and compiling (times are in sec.),

when Spec iteration parameter = 10.. . . 122 5.11 Memory footprint reduction due to remote loading and compiling

(sizes are in MB), when Spec iteration parameter = 10. . . 123

(15)

5.12 Speed up due to remote loading and compiling (times are in sec.), when Spec iteration parameter = 100. . . 124 5.13 Memory footprint gain due to remote loading and compiling (sizes

are in MB), when Spec iteration parameter = 100. . . 125

(16)

Chapter 1 Introduction

Today we are witnessing the emergence of several trends in distributed systems. That is, the face of networks and especially the face of the Internet is changing. Whereas until recently, node elements of the network were mostly fixed and interconnected by cables, we are seeing today the emergence of wireless networks (Wi-Fi, GPRS, Bluetooth [97] to quote only a few of them) as well as mobile nodes (personal digital assistants, mobile phones and portable computers).

Several wireless protocols exist today that range over short and long distances. Wireless LAN protocols such as Wi-Fi are replacing cabled Ethernets;

GSM, GPRS and the future implementation of the UMTS standard can connect mobile devices to the Internet, and Bluetooth is used to connect several devices over short distances. It is expected that in the next few years 40%

of Internet accesses will be made through wireless connections [73].

Another trend visible today is the constant growth and importance taken by mobile devices. Personal devices such as mobile telephones are becoming full-fledged computers. Telephones equipped with keyboards now exist, which have memory of the order of hundreds of mega-bytes, and that run sophisticated operating systems. Of course, these telephones support all wireless protocols mentioned in the previous paragraph. For instance, GSM/GPRS can be used to gain access to the Internet and Bluetooth is ideal for data exchange with devices closely located to each other. We can also observe that mobile devices often come with a smart card. This means that sensitive processing, e.g. for electronic commerce, can potentially be

(17)

carried out on the device, and that mobile devices can become more secure than fixed computers. Today, 70% of the population already carry mobile devices [73]; this shows the important role these devices play in people’s everyday life and the important place they are occupying in our computing infrastructure. Further, because current research in hardware is leading to size reduction and a constant increase in terms of computing power and battery lifetime, we can expect that mobile devices will become even more ubiquitous.

A consequence of the advent of wireless networks and mobile devices is the emergence of new types of distributed systems based on ad hoc [59] or spontaneous networks. An ad-hoc network is made of wireless mobile devices moving around freely and cooperating. In an ad-hoc network, independent components (devices) not only play the role of a standard participant exploiting the resource present on the network, but also play the role of a routing element. This differentiates them from a node on a fixed network. Typical devices expected to be members of ad-hoc networks are PDAs. A key aspect of ad-hoc networks is that they can establish themselves without the need of a centralized administration or a fixed infrastructure (since each node is itself a router). Another important characteristic of an ad-hoc network is that the its configuration can dynamically evolve at a fast pace; participants can join and leave the network at any time. Examples of ad-hoc networks applications are numerous. They include for instance battlefield communication [44]

where, despite the lack of a fixed infrastructure, soldiers can still continually receive and transmit orders and information from their hierarchy. Another example would be an ad-hoc network formed by neighbour cars running on a highway. In the presence of an event, such as an accident for instance, the information would be broadcast to members of the network.

Today, we are also seeing on the Internet the emergence of another kind of network: Peer-to-Peer networks [53]. What characterizes peer-to-peer systems is that nodes are at the same time both server and client. This is a shift from the traditional view (the client-server model) where a clear separation between the server and the client role exists. The goal of Peer- to-peer systems is to allow resources sharing between members. Peer-to-peer systems have been popularised by applications for file sharing such as Napster [54] and Gnutella [33]. They are now used in such project as SETI [6], which uses community members’ spare CPU time to analyse radio signals from outer

(18)

space. Furthermore, a lot of people in the operating systems community see disk space sharing among participant computers as a good solution to the disk backup problem [25]. Similarly to ad-hoc networks, members in a peer- to-peer system can also join and leave the network at any time.

Programming distributed systems has always proved to be a difficult task [91]. A distributed system is composed of several machines connected together via a network. These machines do not directly share memory but communicate via message exchange. In such systems, programs are inher- ently difficult to design, write, reason about and debug. There are several reasons for this. The latency incurred by the network is one of them. Latency is the time required for a message exchange between a source machine and a destination. It is defined by the network’s physical properties but depends also on the network load. Latency adds a degree of uncertainty in distributed systems and as a consequence can create performance problems. Besides latency, partial failure is another problem. A distributed system relies on the collaboration of several nodes to perform a task; however a node can fail or disappear. These failures must be taken care of if a complete failure of the entire system is to be avoided. In general, the burden of handling such failures is left to the programmer, which tends to further complicate his task.

With the advent of wireless communication media, mobile nodes, ad-hoc and peer-to-peer networks, computer systems are becoming more dynamic, reactive and complex. Programming a system possessing such properties is even harder than programming a ”standard” distributed system. As can be seen from the following examples, the difficulty stems from the fact that disconnection plays a central role in such systems. Take for instance a user connected to the Internet via his mobile phone. If the phone is passing in an area not covered by the cellular network, then his communication will suffer because a disconnection, which might be temporary, will manifest itself.

Another example is a Bluetooth piconet [97] where a user’s PDA may lose contact with other PDAs as the user wanders around the piconet area. Yet another example is an Internet peer-to-peer system where a user can connect and disconnect his PC from the community at any time. As we can see from these examples, disconnections have several sources. Networks that give their members the possibility of joining and leaving at any time is one reason.

Wireless network failures and limited coverage within certain areas such as a tunnel are another reason. Furthermore, network latency, depending on

(19)

the delay that has to be undergone, can also be perceived by a user as a disconnection. Since disconnections are so omnipresent in such systems, their correct handling is of primary importance. Not doing so can have a severe negative impact on system robustness; that is, on the system’s capability to resist failures and to minimize their impact.

1.1 Autonomous Systems Overview

This thesis contends that it is of primary importance to provide autonomy features to applications developed above today’s dynamic distributed systems. We expect that this would enforce application robustness and ease the programming of such systems. We define an Autonomous System as a distributed system or application where machines may join and leave the network at any time.

By autonomy, we mean that programs in such systems must be able to continue working despite changes in the network. Thus a program must minimize its dependence on other programs, it must be able to cut its links to the environment as well as the environment’s links to the program, without crashing either the program or the environment. Further, network outage must not be considered as an error; rather, the system must enable programs to resume communication should the network reappear. The system must also contain mechanisms that maximize the potential for programs to locate information in a dynamic network where the set of machines available at any moment is arbitrary. In addition, programs are allowed to move around so that they have the possibility to escape from a node where they are unlikely to be able to continue their execution; or they can get closer to another program with which they plan to exchange a significant amount of data.

The autonomy features we have just described form part of the programming model that we present in this thesis.

Security is another issue that autonomous systems have to deal with. It might be impossible to prevent a machine from entering into an autonomous system network, e.g., a PC connecting to a P2P community or a Bluetooth PDA entering into a piconet. This means that a user will not be able to trust all other programs in the network. It is crucial therefore that measures be taken to prevent a malicious program from gaining access to a program’s

(20)

data. In addition, the confidentiality and the integrity of messages exchanged between programs must be enforced. This means that only the designated receiver might gain access to the contents of a message, and a program may not alter the contents of a message generated by another program.

Considering the success of wireless devices today among consumers, we can expect that soon the number of mobile nodes will exceed by far the number of fixed nodes on the Internet. Consequently, this should lead to a situation where heterogeneity in autonomous system networks in terms of hardware and software will be very present. For this reason, in this thesis, in addition to recognizing the importance of autonomy, we also put a lot of emphasis on dealing with heterogeneity. What is interesting is that, as we will see below, solutions we adopted for dealing with heterogeneity turn out to also enforce autonomy.

When developing applications for heterogeneous environment, the usual approach is to strive for portability. This is the ability to run the same applications on different platforms without having to adapt them each time for each different platform, i.e. by modifying the code. The success of the Java programming language [35, 49] is largely due to the fact that its execution environment has been ported to many hardware and operating system platforms.

To obtain portability, an adequate solution is to resort to virtual ma- chines [49]. A virtual machine (VM) is a software abstraction that erases the differences which can exist between different hardware and operating system platforms. A key element of this thesis is that we design and develop a virtual machine for our autonomous programming model.

Since in our model autonomous programs are run above a virtual machine, to further enforce the overall autonomy of the entire system, we choose to design the virtual machine as a distributed one. In general, virtual machines are built in a monolithic way. Monolithic means that the elements of the virtual machine have their code and data strongly intertwined. This archi- tecture finds its raison d’ˆetre mainly in the need for the speed required for running applications. Distributing the virtual machine means that its elements such as the loader or the compiler, can be extracted from its core and made available as separate autonomous components. These components can even execute on separate nodes. In contrast to the monolithic approach, VM

(21)

elements are no longer interdependent. The advantages of the approach are that the VM workload is distributed over several nodes. This can improve application running time and also reduce a device’s memory usage. Indeed, the approach can be used to support devices with small memory footprints since only the VM core needs to be installed on them; the other VM elements run on other nodes on the network. Other advantages of the approach include the possibility for VM elements to perform their tasks ahead of time (e.g. off-line compilation of classes) so that several client devices can directly exploit the results of these tasks. The benefit comes from the fact that a task does not need to be performed again and again. Finally, in such a configuration, it is easier to upgrade the VM elements.

1.2 Thesis Contribution

The aim of this thesis is to provide a programming model for applications running above today’s dynamic distributed systems. The model insists on the importance of providing autonomy support to applications, as well as on providing a solution for heterogeneity in such systems. The contributions of this thesis are therefore centred around these two aspects.

• Autonomous System Programming Model. This thesis presents the de- sign and implementation of a programming model for autonomous systems, taking into account their properties and their requirements. The model, called Lana, defines programming language primitives that take into account autonomy and security for autonomous systems as well as related aspects such as mobility, robustness in the presence of break- downs or sporadic failures, safety in open and dynamic networks and portability and collaboration in a strongly heterogeneous world.

• Autonomous System Execution Environment. The second contribution of the thesis is the development of the supporting virtual machine, called LanaVM. This virtual machine is distributed in the sense that different components of the virtual machine such as the loader and the compiler are separated from the core VM and run as autonomous entities. This approach is original in the way virtual machines are built, and it presents several advantages. Distribution increases the autonomy

(22)

of the applications running above the VM, improves performance and reduces memory usage.

1.3 Plan of Document

The thesis is divided into two parts. The first part, covering Chapters 2 and 3, concerns the provision of autonomy in dynamic distributed systems, and the second part presents the design and implementation of the associated virtual machine and contains Chapters 4 and 5.

Chapter 2 is devoted to autonomous systems. We present their properties and define the requirements that have to be satisfied by a programming model for autonomous systems. In addition, the chapter provides a presentation of existing research in programming models for autonomous systems.

Chapter 3 presents our programming model proposal for autonomous systems. The chapter describes the new constructs we propose for ensuring autonomy, security and robustness.

Chapter 4 explains the rationale behind the design of our distributed virtual machine for autonomous systems and presents related work, whereas Chapter 5 describes the design and implementation of the distributed virtual machine and provides an evaluation of the approach.

Finally, the last chapter presents the thesis conclusions.

(23)

(24)

Part I

Autonomous Systems Model

(25)

(26)

Chapter 2 Autonomous Systems

During the last few years, we have witnessed a change in the computing physical infrastructure. Whereas before the landscape was mainly composed of fixed network nodes connected together by cabled networks - a situation that we call the traditional infrastructure, today we see the emergence of wireless networks and mobile nodes formed by PDAs, mobile phones and the like. These recent developments complement the traditional infrastructure on which the Internet has been developing for decades. The environments formed by the association of the physical infrastructure (i.e. wireless and cabled networks, mobile and fixed nodes) and the applications running above this infrastructure are what we call Autonomous Systems. In the rest of this document, we will refer to the applications as autonomous systems applications and to the mobile devices as autonomous systems devices.

The emergence of wireless networks and mobile nodes brings new opportunities as well as new problems. Among the opportunities, we can see today the development of new types of applications such as those based on ad-hoc networks. These networks are characterized by the fact that partic- ipating nodes are themselves routing elements relaying packets on behalf of one another. This means that they do no need to rely on a fixed network infrastructure to function properly and that in such networks, nodes can leave and join at any time as illustrated by Figure 2.1. Ad-hoc networks have found applications in areas such as battleground communications, disaster recovery efforts, conferencing without the support of a wired infrastructure, and interactive information sharing [59]. Another class of Autonous System is

(27)

Figure 2.1:

An Autonomous System: an ad-hoc network of PDAs. At any time, new members can join and some might leave. Decisions to join or quit are taken unilaterally by members.

Peer-to-Peer networking [53] where nodes pool their resources in common for sharing. In these networks as in ad-hoc networks, the network configuration evolves very dynamically.

Among the issues raised by Autonomous Systems, problems related to disconnections are numerous and important. There are several reasons for disconnection in Autonomous Systems. First, because nodes in the system have the possibility of being mobile, they can move around or disconnect from the network. Consequently, they then become unreachable and leave their communicating parties in a disconnected state. Second, because wireless networks are less reliable than cabled networks and further cannot cover certain areas, more frequent failures and disconnections are expected. Fi- nally, because of the ad hoc nature of Autonomous Systems, i.e. nodes leave and join at any time, disconnection is further exacerbated.

Autonomous Systems, from a programming point of view bring new issues to which we believe, current programming models do not provide adequate solutions. Among the issues, we can mention the problem of handling the autonomy needs of Autonomous Systems, their robustness and their security.

A contribution of this thesis is the provision of a programming model that

(28)

matches the needs of Autonomous Systems. Before giving the presentation of our programming model in chapter 3, in this chapter, we first start by describing today’s computing infrastructure evolution, followed by the iden- tification of the properties that characterize Autonomous Systems, and then based on these properties, we elaborate upon the requirements needed by a programming model if it wants to support Autonomous Systems. This is followed by a review of current programming models, focusing on explaining why they do not satisfy Autonomous Systems requirements. Finally, this chapter ends with a description of research similar to ours, i.e. providing solutions to Autonomous Systems issues.

2.1 Physical Infrastructure Evolution

Since physical infrastructures have an influence on programming paradigms, we first start by describing the infrastructure present before the appearance of Autonomous Systems.

Before the emergence of personal digital assistants (PDAs), mobile phones, other mobile devices and wireless networks, the physical infrastructure was generally made of servers and desktop computers connected together to form a local network. Connected together, these local networks form the global network we know today as the Internet. At the time, there was no other choice to using cables for connecting devices together. In this model, mobility is highly hindered because devices are too encumbering; but even in the case where they are potentially mobile, as is the case for laptops, they had to choose between mobility and network connection since connection had to go through cables.

Since then, the physical infrastructure has experienced some changes.

The evolution we are talking about concerns primarily the emergence of a new category of devices. These are more lightweight, more portable and more mobile than servers and desktop computers. Among these devices we can mention PDAs, mobile phones, laptops and other future mobile devices created from the imagination of designers. This sector keeps growing day after day and the number of sold devices is in constant progress. Today, the global number of mobile users is estimated at 1.52 billions [19].

(29)

Today, we are witnessing the convergence of PDAs and mobile phones.

Some of these devices are already equipped with digital cameras. It seems probable that in the near future, a mobile device will combine a lot of functionalities on a single hardware; a device will be at the same time an electronic purse, a digital camera, a PDA, a telephone, etc... Moreover, these functionalities could even be enriched by the possibility of dynamically downloading new services. This kind of device will gain more and more importance in our daily lives. However, there remain problems to be solved before they gain acceptance by a large audience. Short battery lifetime, limited size of the graphical user interface which does not facilitate data input and reading are some of the problems.

The other factor that influences the physical infrastructure evolution is the emergence of wireless networks. Whether it is for short distance communication with Bluetooth [97], for a local network with a norm such as Wi-Fi [96], or even for long distances with mobile phone networks, wireless stan- dards are widespread. Nowadays, it is possible to establish a communication without resorting to a cable at any moment. Wireless networks however suffer from a lower reliability compared to cabled networks. However, this does not hinder their widespread acceptance.

2.2 Example of Autonomous Systems Appli- cations

The association of mobile devices and wireless networks are opening up new opportunities, but only a small portion of these opportunities has been ex- plored so far. New applications are emerging from this association. Among these new applications, we can mention mobile applications (also called ”m- applications”) [70] whose novelty comes from the fact that they can be ex- ploited even though their users are in movement. Another application is mobile commerce (also known as ”m-commerce”), which belongs to the category of mobile applications. The industry is expecting revenues in the order of billions dollars with m-commerce [70]. This shows that the stakes are high in this area.

It is therefore not surprising that hardware manufacturers and software

(30)

developers are showing a lot of interest in these domains. In mobile applications, the notion of locality and context-awareness will play an important role. These applications will also have to take into account the ad hoc nature of the networks on which they execute. These aspects differentiate mobile applications from any other applications and they explain why new programming models are required to handle their unique properties.

An example scenario of a mobile application is a scenario where a representative is sent by his company to a foreign country to negotiate a new con- tract. While travelling by train, the representative in order to better prepare his negotiation with the client uses an application on his laptop which requires constant interaction with a server application residing in the company headquarters. The application retrieves data from the company’s databases and communication between the server and the laptop is mostly carried by a wireless infrastructure. A recurrent problem for the representative is the rather fluctuant quality of service shown by the wireless infrastructure. For instance, communication can be completely hindered because the train goes through a tunnel, or bandwidth can vary greatly during the travel because the train traverses region that are more or less well covered by the provider infrastructure. These are typical problems faced by autonomous systems.

As mentioned in the introduction, another category of autonomous systems applications is the one based upon ad-hoc networks. To give an example, we can imagine that during a conference, each participants device serves as a node in an ad-hoc network. The advantage is that they can co-operate even when a fixed infrastructure is not present. And even in the case where such an infrastructure exists, the ad-hoc network would carry a certain amount of the communication and therefore would decrease the infrastructures load [62]. In such an application, people would hang around with their devices and form spontaneous networks with other people with the same interest.

Their devices would then be used for instance simply for exchanging data, for searching information or for collaborating on the same application. In our example application, we can expect the configuration of the network formed by the participants to be highly dynamic. New people can at any time join a group and participants can leave whenever they want. This is typical of ad-hoc networks. Another property of these networks is that it is not possible to rely on the notion of a centralized server or service. In our example for instance, if a participant is looking for information about the city where

(31)

the conference is taking place, then there will be no clearly designated single point from which he will be able to retrieve the desired information.

Remote sensors [2] represent another category of ad-hoc networks. These devices are very small hardware with limited functionalities. Applications running above these devices could for instance be targeted at collecting var- ious information at places where access is difficult for humans. Typically, collecting temperatures at different locations inside a volcano is a good example [62]. Sensors duty is to transmit collected data to a remote server. Be- cause of the strong constraint on power consumption and the cost of wireless communication that increases with distance, being organized as an ad-hoc network helps sensors to reduce their energy consumption [65].

Another example of autonomous systems applications are the ones based on peer-to-peer networks [33, 68]. The first application that popularized the idea is Napster [54]. Its goal was to facilitate file exchange between users. In Napster, files that users are willing to share are announced to an index in a central server. This way, when a user requires a file, it queries the server and the latter tells him where to find the files. It suffices to click and the file is downloaded to the client. An application such as Napster is really easy to use.

This explains why million of users were members of the Napster network and why today successors to Napster such as Gnutella [33] are also so popular.

Due to its success, more and more users joined Napster. However, as the shared files were indexed on a central server, Napster had to face a severe scalability problem. This issue encountered by Napster reveals a property of autonomous systems: they do not like centralisation. As was the case with Napster, initially peer-to-peer networks were mainly used for file sharing, but we are seeing a shift towards the sharing of any resource available on a machine, e.g. processing power and storage capacity. This is the goal that projects like Grid computing [28] try to achieve. With this extension to other areas, popularity of peer-to-peer applications will certainly not decrease.

2.3 Autonomous Systems Properties

From the examples described in the previous section, we can infer a set of properties characterizing autonomous systems.

(32)

An autonomous system is a system composed of distributed participants willing to collaborate and to share common resources. In an autonomous system, the number of participants can be extremely high as exhibited by peer-to-peer systems. In this situation, what is important is to ensure that the model on which the autonomous system relies scales.

Another important autonomous system characteristic that distinguishes them from other models is the absence of a centralized entity. Intrinsically, there might be no centralisation in an autonomous system. There are different reasons for this. One of them is mobility. When an entity is strongly mobile, it is impossible that it will always stay directly connected to a central point without going through other relays. Rather, it would have to exploit the resources located in its neighborhood, that is, it would have to connect to and rely on a close neighbour to carry its communication. This fact is well illustrated by ad-hoc networks as we have seen from the examples from the previous section. Indeed, an important characteristic of ad-hoc networks is that each node is equipped with routing capabilities and a node willing to communicate carries its communication through the closest nodes. In autonomous systems, the absence of a central entity is not only something that has to be undergone, but it is also a desired property of the system as can be seen from the example of Napster described in the previous section. We saw that the design chosen by Napster of indexing sharable files on a central server was doomed to failure. In essence, the fundamental idea behind peer- to-peer is to avoid having a central entity as the basis for resource sharing and exploitation. Indeed, downloading a resource such as a file from a unique point is clearly not a scalable solution. Rather, having the resource available on many distinct nodes would distribute the load on different machines and provides a solution that scales better. Therefore, avoiding centralization is a clear goal in autonomous systems.

Another property of autonomous systems is that the system can be widely distributed. What we mean here is that very long distances can separate nodes. As a consequence, several problems arise, such as the amount of time taken by communication. Indeed above a certain point, communication time is no longer negligible nor transparent to the user. This is a direct consequence of the light speed limit. Another problem that has to be dealt with when the system is widely distributed is congestion, happening when the network is overloaded. Further, when communicating in an autonomous

(33)

system, several barriers tend to hinder the good behaviour of the interaction.

One of the barriers is the so-called ”tunnel effect”¹. We see a manifestation of this phenomenon when an autonomous systems device such as PDA involved in a wireless communication goes under a tunnel. Its transmission is subject to failures. The tunnel effect can be generalized to any situations where communication is hindered by physical elements. Another barrier comes from security requirements. Nowadays, to protect from intrusion, firewalls are present around local areas and filter out undesired remote accesses. These barriers tend to complicate interaction between autonomous systems devices.

Another recurrent theme observable with autonomous systems is the high dynamics with which networks configuration can evolve. In ad-hoc networks or peer-to-peer systems, nodes can join and leave the network at any time as illustrated by the example we mentioned previously about cars people exchanging data and collaborating during a conference. The reasons allowing this rapid evolution of network configuration are twofold: the emergence of local and wide-area wireless networks and device mobility. Rapid evolution of network configuration poses certain problems. For instance, no centralisation whatsoever can be assumed since composition changes constantly. This means that problems such as locating services on the network or knowing who is present on the network becomes more difficult for no central point can be queried. Another problem is to find the correct answers to be given when nodes acting as servers abruptly leave the network, i.e. what solutions can the system propose to clients in order to deal with the disconnection.

Finally, drawing from the properties described above, we observe that disconnection and failures are important characteristics of autonomous systems. Several examples illustrate this. For instance, we saw that a property of autonomous systems is the rapid evolution of the network configuration.

This leads to a situation where client nodes can be disconnected from server nodes at any time, leading to a failure if nothing is done against it. As we also saw, strong distribution is characterized with problems such as congestion and difficulties for on-going communication to cross physical and security barriers. This leads to a situation that could also be perceived by a node as a failure. In the same vein, device mobility and relative reliability of wireless networks increase the potential of disconnection. Therefore we conclude that

1We are of course not talking about the well known quantic phenomenon present in physics here.

(34)

disconnection is inherent to autonomous systems and since it can manifest itself so frequently, it requires a special attention.

2.4 Programming Model Requirements

One of the contributions of this thesis is the provision of a programming model that suits the needs of autonomous systems. The programming model is presented in the next chapter, but first in this section we give the requirements with which such model shall comply. We believe that these requirements are necessary if the model wants to provide a good support to autonomous systems. These requirements are derived from the observation of autonomous systems properties.

• Support for disconnection. We saw that in autonomous systems disconnection plays an important role. For instance, mobility allows devices to join and quit networks at any moment. Further, tunnel effect, congestion, security barriers and the like participate also in the increase of the potential for disconnection. Figure 2.2 presents several situations leading to a disconnection. Disconnection is sufficiently important to deserve special attention in autonomous systems. Con- sequently, every primitive developed for our autonomous system programming model will have to take disconnection into account. Thus, our design guideline is such that when facing a choice, we favour the development of programming primitives that support or helps to improve the support of disconnected operation. For instance, an area of the programming model where such a choice arises is communication.

Choices for a communication model concerns such aspects as deciding whether communication will be synchronous or asynchronous, or whether the system shall maintain information about the current state of a communication or not. From what we know about autonomous systems, it seems more appropriate to choose an asynchronous model as the basis for their communication.

Another area of the programming model where disconnection has an impact is coordination [31]. In order to achieve common goals, autonomous systems elements need to collaborate. This is what coordination is all about. To collaborate means being able to share data and

(35)

Figure 2.2:

The causes of a disconnection are numerous in autonomous systems:

a member abruptly leaves the network, a device goes through a tunnel or a failure is present on the network infrastructure.

being able to apply some treatments to these data. In an autonomous system, coordination is complicated by the constraint imposed by possible disconnections. For instance, collaboration requires that data be exchanged between collaborating entities. The problem with autonomous systems is that the exchange shall still be possible even if one or more entities involved in the exchange becomes disconnected.

Assuming that we take the model provided by synchronous communication to solve this problem, then we realize that this model is not suitable since it requires the presence at the same time of both the callee and the caller during the data exchange.

Another requirements for an autonomous system programming model in order to support possible disconnections is to avoid any references

(36)

with strong reference semantics between any two entities. A reference to an entity has a strong reference semantics when it either directly points to the entity or gives the illusion of such a direct reference.

Remote Procedure Call (RPC) [55] and Remote Method Invocation (RMI) [78] are examples of a system with strong reference semantics.

Even though a local reference cannot directly point to a remote object, RMI’s goal is to give this illusion to the programmer. The problem with this semantics in autonomous systems is that disconnections are too present. They would lead to dangling references, i.e. references whose targets are no longer present. Consequently, disconnections prevent any possibility of maintaining the illusion of a direct connection.

To terminate the discussion about the requirements needed by the model to support disconnected operation, we observe that by looking to improve support for disconnection, what we are really looking for is actually to improve the autonomy of the system members.

• No reliance on centralization. In a world characterized by high disconnection rate, it is not desirable to rely on a central point for the latter might simply disappear due to voluntary or involuntary disconnection. Therefore in the model, at any level, no assumption about centralization should be made. For instance, when a user needs some information about the weather, he should not have to rely on a specific site such as www.weather.com to get his information but rather rely on the system search capabilities to find out. The system might for instance use broadcast to query its neighbourhood to do so.

• Scalability. There can be potentially many participants in an au- tonomous system. It suffices to mention the number of users in a peer- to-peer system as an example. Further, due to ease of mobility, an autonomous system network can evolve quite rapidly. Therefore all programming constructs, rules and principles we are developing for the autonomous systems programming model must scale. Scalability issue explains why centralization is to avoid in an autonomous system. In- deed, in a peer-to-peer to system, if all the resources available were to be downloaded from a central point, with the number of participants growing, this central point would at some point fail to deliver.

(37)

• Support for mobility. In an autonomous system, failures and disconnections are frequent. In order to defeat the deficiencies brought by failures and disconnections, mobility is a solution. There are two kinds of mobility: physical device and user mobility, as well as code mobility. When a device gets corrupted or is seen to misbehave, a program residing on the device and representing an autonomous entity could be sent to another device and could continue working from there in order to escape to the bad conditions on the initial device. A reason that jus- tifies the exploitation of code mobility is when a collaboration between autonomous system entities requires huge amounts of data. In order to save bandwidth and to avoid latency problems, the entity needing the data could physically move to the data provider. Therefore another requirement of the model is the support of both kinds of mobility.

• Security. Security is of high importance to autonomous systems.

There are several reasons for this. First, there is the need to protect data exchanged between autonomous entities. No one should be able to intercept or alter a message that in not intended for him. In other words, integrity and confidentiality of messages must be preserved. Sec- ond, since code mobility is a feature offered by the system, the impact of downloaded code must be controlled in order to avoid negative effects such as virus infection or denial of service attacks. Finally, measures must be taken to protect two entities present on the same device from interfering negatively with each other.

• Support for multitasking. Due to possibility of code mobility or simply because a user or multiple users might want to run multiple applications, an autonomous system device should be able to provide support for the execution of multiple programs on the device, i.e. it has to be ready for multitasking.

• Event-oriented. In environments similar to autonomous systems, we expect to see a lot of activities happening asynchronously. To give a few examples, these could be the appearance and the disappearance of network nodes, the announcement of an entity’s desire to enter in communication with another entity or the announcement from the environment about changing conditions to a node about to enter into a

(38)

new area. Events are needed in autonomous systems because contrar- ily to other communication mechanisms such as remote method calls, they correspond better to an asynchronous and disconnected model of communication.

• Support for service discovery. In order to collaborate, autonomous system entities need to locate each other as well as resources and services they might require. However, properties of autonomous systems such as the potentially high number of participants, mobility with nodes leaving and joining the network at any time, and the impossibility of exploiting a central entity complicate resource discovery. Indeed, a service available a few minutes ago might have disappeared simply because the device on which it was provided has left the network. Therefore the programming model should provide an appropriate mechanism for matching resource queries with resource offers in the presence of the above constraints.

Another issue in resource discovery in autonomous systems concerns resource naming. Autonomous systems are distributed systems with high degree of heterogeneity in term of members’ background and in- terests. Therefore, we cannot expect an agreement from these dis- parate members on the way resources are named in the system. As a consequence, the programming model should also seek at providing the necessary means for allowing resource discovery even when naming schemes might differ.

Figure 2.3 shows an environment complying with the requirements for autonomous systems mentioned above. The figure shows autonomous system devices capable of supporting both multitasking and migration of autonomous system entities, and the latter communicates using an event-based mechanism.

2.5 Adequacy of Current Programming Mod- els

We described in the previous sections characteristics and properties of autonomous systems as well as the requirements needed by a programming

(39)

Figure 2.3:

This figure shows three interacting autonomous system devices.

Devices are able to run multiple autonomous system entities concurrently. Entities communicate by exchanging events and they can migrate from device to device.

model in order to support them. Now, based on these properties and requirements, we turn our attention to the question of knowing whether current programming models such as the one proposed by the Java programming language [35] are adequate for autonomous systems. Java has been chosen because it provides its own answers to several issues we tackle in autonomous systems, e.g. distribution, security, code mobility, etc... Further, Java has gained widespread acceptance and is representative of today’s trend in programming language.

(40)

2.5.1 Java’s programming model

Java is a strongly typed object-oriented language. During a Java program execution, objects are instantiated with their fields either containing primitive values or pointing to other objects using strong references. References are strong when they directly point to their targeted object, and are weak when they point to a proxy object in charge of representing the targeted object. In Java, objects exchange messages through synchronous method calls.

Synchronous means that when the caller sends its message, it waits for an immediate answer; therefore, the callee must be present at the time the call is made and must process it right away.

From the properties mentioned above, we can infer that Java is more oriented towards an environment where locality, synchrony and connection prevail. When we talk about locality, we mean locality of references, i.e. the fact that Java assumes that a reference will always point to an object present locally on the heap (i.e. local memory) and not to an object present on another machine. Similarly, the Java model is based on a synchronous model because callers are blocked until a callee has performed the required task.

Only then can the caller continue its task. We say that Java assumes connectivity because communications (calls) happen locally. Callers and callees are located on the same machine pointing to each other with direct references. This physical proximity and today’s hardware physical characteristics render almost improbable the possibility of a failure.

Contrast this with an Autonomous Systems environment where collaborating objects can potentially be located on different machines, leading to a situation where references targets might not be present locally, where communication can no longer assume connectivity because of the tunnel effect and its derivatives, and where synchrony must be abandoned because of the arbitrarily long delay between a request and its satisfaction could be. This suggests that the Java model as described so far does not fullfill the requirements of Autonomous Systems. The difference between the Java model and the Autonomous System one is illustrated in Figure 2.4.

However, one could argue that other specifications developed around Java are sufficient to adapt Java’s model to support Autonomous Systems needs.

For instance, Java provides the RMI (Remote Method Invocation) specification [78]. RMI’s goal is to allow Java to support distributed computing.

(41)

Figure 2.4:

In the Java model, references have strong reference semantics and calls are synchronous. In autonomous systems, the model needs weak references and asynchronous calls because of uncertainty introduced by disconnections.

(42)

When deciding for a distributed computing model, designers have the choice of making distribution transparent to the programmer or not. In the case of RMI and other proposals with similar goals, such as RPC [10, 9] or CORBA [56], distribution is transparent. This means that it should not make any difference for a programmer whether an object is created locally or created and located at the opposite side of the world. The obvious advantage of such an approach is that programmers are relieved from the burden of handling distribution. In other words, they can keep on thinking on distributed problems as if they were local. That is, the mental model being used is the usual one of local, synchronous and connected interaction. However, expe- rience [16, 91, 42] shows that transparent distribution works well when the deployment concerns a local network but beyond that, when applications are distributed over a global area, a lot of problems arise.

One of these problems relates to information propagation delay. With the light speed as a limit, when applications are widely distributed, delays increase proportionally with distance making them no longer negligible. More- over, in the presence of an open network, congestion and failures tend to further augment the delay. As a consequence, experiencing these delays and failures, programmers tend to naturally look at other models than the ones that try to hide distribution. Since problems faced in a widely distributed system are identical to some of the problems encountered in autonomous systems, our discussion suggests that RMI and all other concepts based on transparent distribution are not adequate for autonomous systems. What we observe here is that the assumption of local, synchronous and connected interactions is challenged.

Robustness

Robustness is a systems capability to tolerate failures or to minimize their impact. Java has been carefully designed so that program executions are as robust as possible. Robustness is important because it helps to enforce security and moreover, since it reduces the occurence of failures, it augments programmer’s productivity. Measures taken in Java for ensuring robustness include ensuring type preservation in presence of casting, disallowing pointer arithmetic, requiring the presence of a garbage collector and enforcing array overflow andNullreference checking (i.e. checking before a dereference that

(43)

the reference is notNull). Further, we can also mention that no indication on the memory position of virtual machine data needed for application execution is given neither in the virtual machine specification [49] nor in the bytecode specification [7]. All these measures reduce the chance of an attacker being able to corrupt a program execution.

Another mechanism used in Java for improving robustness is exceptions.

Exceptions allow notifications of the fact that something went wrong in the program. Exceptions appeared in Java because of the lack of correct mechanisms for handling failures in languages like C. In C, there is no standard way for signalling an error. Each API can define its own way of signalling errors (typically by returning -1). Java improved the situation by introducing one unique mechanism for handling this task. Further, Java forces programmers to take care of (catch) exceptions when they arise, which is not the case for C.

C’s laxity can lead to very difficult programs to debug and therefore reduces application robustness.

So far we have only discussed the Java way of ensuring robustness, now we turn our attention to autonomous systems to see the kind of failures they encounter and to see whether robustness mechanisms used by Java are adequate for autonomous systems.

With the emergence of mobility, application robustness is challenged. In a world where wireless communications play an important role, disconnections are numerous. We classify these disconnections in two categories: voluntary and non-voluntary.

An important characteristic of ad-hoc networks is the continuous evolution of their configuration. A node leaving an ad-hoc network is the il- lustration of a voluntary disconnection. There are however other types of disconnections which are not the result of a clear will expressed by the user.

Examples include a failure on an element of the wireless network infrastructure, a disconnection resulting from the movement of a mobile device through a tunnel, or the device’s presence in a area not covered by the wireless infrastructure. Further, latency can also give the impression of a failure. In- deed, the time required for a communication is never known in advance since it depends on the network load and the distance that separates the communicating entities. Consequently, a user might have the impression of a disconnection whereas it is only a problem of latency.

(44)

All these examples show that in a wireless word, disconnections are any- thing but rare; this complicates the task of programming. Further, what makes the task even more difficult is the fact that distinguishing between a failure and temporary disconnection is not easy.

Indeed, if we were sure that disconnections in autonomous systems were truly failures, and moreover that these failure occurrences were only exceptional, then we would not need to develop other mechanisms than Java’s for ensuring robustness. The reason is that Java’s mechanisms were designed with the hope that failures happen only rarely and that it suffices to give the programmer the means to handle them.

To better illustrate this point, take the example of exceptions. An exception is raised during a program execution when an error is detected. In Java and similar languages, the hope and the assumption is that exception occurrences remain exceptional events. Indeed, in the opposite case, programming would be ”exception oriented”, and code that would result would be plagued by exception handlers. This would lead to less readable and less structured code and consequently to more fragile programs.

Another example can be taken from the context of networks. TCP [61], a protocol of the Internet family, offers certain guarantees against failures.

Regarding exceptions, the assumption is that TCP should be used in environments where failures (loss of packets) are rare, because otherwise the protocol would no longer function correctly. Indeed, failure recovery in TCP is an expensive mechanism since in the case of one error, a lot of exchanges between the communicating entities are required to solve the problem. As a consequence, in an environment where failures are the norm rather than the exception, it would simply not be possible to execute applications running under TCP.

These two examples show that a lot of today’s paradigms in computer science have been built with the assumption that failures are exceptional. In autonomous systems however, failures or disconnections are not rare.

If we compare the kind of failures we encounter in Java and the kind of failures we encounter in autonomous systems, we notice that they are of different nature. Looking carefully at how Java takes care of failures, we can observe that Java focuses mostly only on handling logical programming errors, whereas in autonomous systems, we face another kind of errors as well,

(45)

Figure 2.5:

The Java 1.0 security model. All downloaded codes are isolated by being put into the sandbox region. They do not have access to the machine resources.

i.e. errors due to physical problems coming from the infrastructure. In Java, we can of course face non logical errors such an OutOfMemoryException, but they are rare compared to autonomous system errors due to disconnection.

Therefore, even though Java mechanisms for failure handling are certainly useful and necessary for autonomous systems, they are not sufficient. Addi- tional primitives are required for facing the new kind of failures we encounter in autonomous systems. We will talk more about these primitives in the next chapter when we introduce our programming model for autonomous systems.

Security

As code is mobile both in Java and in autonomous systems, this capability could lead to a situation where multiple entities (programs) possibly representing multiple users run simultaneously on the same execution platform (virtual machine). This poses the problem of protecting (isolating) each program from interferences from other programs running on the same machine.

(46)

Figure 2.6:

The Java 1.1 security model. Downloaded codes are either completely trusted, in which case they have access to all resources, or they are not trusted at all, in which case access to resources is complete restrained.

Java solves the problem of program isolation through its implementation of the notion of a protection domain. Below we first present the notion of a protection domain in general terms, show how it is implemented in other environments than Java and then illustrate how it is implemented in Java.

The goal of a protection domain is to isolate a program from other running programs. This means ensuring that a program running in one domain is not allowed to observe or alter resources possessed by another program in another domain. This is necessary in order to protect against confidentiality and integrity attacks [45].

In today’s operating systems such as Unix or Windows NT, isolation is enforced by hardware. In these systems, protection domains are materialized by a process and isolation is ensured by the use of separate address spaces for each process.

A lot of effort have been put in recent research [90][11] towards the design and implementation of protection domains with isolation furnished by

A programming model and execution environment for autonomous systems

Thesis

Reference

A programming model and execution environment for autonomous systems

A Programming Model and Execution Environment for Autonomous Systems

Remerciements

Contents

I Autonomous Systems Model 9

II Autonomous System Execution Environment 87

List of Figures

Chapter 1 Introduction

1.1 Autonomous Systems Overview

1.2 Thesis Contribution

1.3 Plan of Document

Part I

Autonomous Systems Model

Chapter 2

Autonomous Systems

Figure 2.1:

2.1 Physical Infrastructure Evolution

2.2 Example of Autonomous Systems Appli- cations

2.3 Autonomous Systems Properties

2.4 Programming Model Requirements

Figure 2.2:

2.5 Adequacy of Current Programming Mod- els

Figure 2.3:

2.5.1 Java’s programming model

Figure 2.4:

Figure 2.5:

Figure 2.6: