An approach to the dynamic evolution of software systems

(1)

Thesis

Reference

An approach to the dynamic evolution of software systems

ORIOL, Manuel

Abstract

In this PhD thesis we advocate that connections between different software entities hinders the ability to make applications evolve at runtime. Our goal is thus to free entities from connections. Therefore, we built a disconnected communication architecture based on three main concepts: associative naming, late binding and asynchrony of communications.

Communication occurs following an all-service approach (e.g. a method is a service) where a service request and invocation occur through a semantic description. The choice of the service that best matches the description of the requested service is performed at the moment of the invocation. In the thesis, we describe several implementations of disconnected architectures and applications. An interesting result is that we were able to obtain 99.99%

availability for a web server (4 restarts in 18 months) while having some parts of the code modified more than 160 times.

ORIOL, Manuel. An approach to the dynamic evolution of software systems. Thèse de doctorat : Univ. Genève, 2004, no. SES 556

URN : urn:nbn:ch:unige-174078

DOI : 10.13097/archive-ouverte/unige:17407

Available at:

http://archive-ouverte.unige.ch/unige:17407

Disclaimer: layout of this document may differ from the published version.

1 / 1

(2)

UNIVERSIT´E DE GEN`EVE

Faculté des Sciences Économiques et Sociales Département de Systèmes d’Information

An Approach to the Dynamic Evolution of Software Systems

TH ` ESE

présentée à la Faculté des sciences économiques et sociales de l’Université de Genève

par

Manuel Oriol

originaire de Grenoble (France)

pour l’obtention du grade de

Docteur ès sciences économiques et sociales, mention systèmes d’information

Membres du jury de th`ese :

M. Ciar´an BRYCE, maˆıtre d’enseignement et de recherche M. Bastien CHOPARD, professeur adjoint, C.U.I.

Mme Giovanna DI MARZO SERUGENDO, maˆıtre-assistante M. Dimitri KONSTANTAS, directeur de th`ese

M. Michel L ´EONARD, pr´esident du jury

M. Luc MOREAU, professeur, Universit´e de Southampton

Th`ese no 556 Gen`eve, 2004

(3)

ii

La Faculté des sciences économiques et sociales, sur préavis du jury, a autorisé l’impression de la présente thèse, sans entendre, par là, émettre au- cune opinion sur les propositions qui s’y trouvent énoncées et qui n’engagent que la responsabilité de leur auteur.

Gen`eve, le 20 avril 2004

Le doyen Pierre Allan

Impression d’apr`es le manuscrit de l’auteur.

cManuel Oriol 2004, Tous droits r´eserv´es

(4)

Remerciements

J’aimerais profiter de l’occasion qui m’est donnée pour remercier les gens qui m’ont permis d’arriver à constituer le document présent.

Tout d’abord, je tiens à remercier le Professeur Dennis Tsichritzis qui m’a accueilli dans son groupe, le Groupe Systèmes Objet, me donnat ainsi l’opportunité de commencer une thèse dans cette spécialité.

Je tiens également à remercier le Professeur Dimitri Konstantas pour m’avoir permis de terminer cette thèse dans ce même groupe et pour les nombreuses interactions que nous avons pu avoir avant et après sa nomi- nation. Je me souviendrai encore longtemps des moments que nous avons passés à organiser les cours, mon travail, l’après-thèse et la conférenceObject Oriented Information Systems (OOIS’03).

Je remercie également le Docteur Ciarán Bryce pour m’avoir, d’une part, suivi dans les balbutiements de mon travail, et d’autre part, aidé jusqu’à la toute fin à constament améliorer la thèse dans son ensemble.

Je tiens à adresser mes plus vifs remerciements au Docteur Giovanna Di Marzo Serugendo avec qui j’ai principalement interagi depuis que la démarche présentée dans ce document s’est précisée. Les nombreuses discussions que nous avons pu avoir, les séances de conseils attentionnés et fondamentaux, quelquefois les fous rires, m’ont permis plus que toute autre chose de terminer ce document.

Je tiens à remercier chaleureusement le Professeur Luc Moreau pour ses retours toujours constructifs et extrêmement détaillés lors de la rédaction de ce document. Je suis persuadé qu’ils ont eut un effet salvateur à bien des

´egards.

Je remercie ´egalement le Professeur Bastien Chopard pour sa gentillesse et sa sympathie lors de l’´elaboration de ce travail.

Les membres passés et présents du Groupe Systèmes Objet ont tous, à leur manière, modifié mon travail au fil des années. Leurs commentaires, leurs encouragements, ont été extrêmement précieux. Je remercie tout par- ticulièrement le Professeur Jan Vitek qui m’a permis de trouver mon sujet, le Docteur Jean-Henry Morin dont les conseils enthousiastes ont été particu-

iii

(5)

iv

lièrement appréciés, ainsi que Chrislain Razafimaefa et Michel Pawlak mes compagnons de route.

Les membres du Centre Universitaire d’Informatique ont tous apporté leur pierre dans cette ambiance chaleureuse qui règne dans le bâtiment d’Uni Dufour. Il est impossible de les remercier un à un mais je me souviendrai longtemps de mes étudiants, de mes collègues du cours Outils Informatiques et des autres enseignants-chercheurs du centre.

Sans soutien venu de l’extérieur, le travail présent n’aurait pu voir le jour, je tiens donc à remercier mes amis les plus proches : Antoine, Carole, Cédric, Laurent, Marc, Marc, Marc, Michel, Mira, Nathalie, Thierry et Yan. Merci

`

a eux d’être ceux qu’il sont. Je remercie de même les ”joueurs” du CUI qui m’ont accompagné dans ces moments de détente et plus généralement les gens qui ont peuplé mon univers pendant ces quelques dernières années.

Finalement, je tiens à remercier tout particulièrement mes parents et mes grands-parents qui m’ont toujours accordé leur amour et leur confiance et à qui je dédicace ce document.

(6)

Acknowledgments

I would like to use this opportunity to thank the people that helped me to build this document.

First of all I want to thank Professor Dennis Tsichritzis for hosting me in his group, the Object Systems Group and thus let me begin a PhD thesis in the object-orientation.

I am also particularly willing to thank Professor Dimitri Konstantas for letting me ﬁnish this thesis, in the same group and for the numerous interactions that we hava had over the years. I will remember for a long time, for the time that we spent organizing courses, my work, the after of the PhD, and the conference Object Oriented Information Systems (OOIS’03).

I thank also Doctor Ciar´an Bryce for following my work when it was at its beginning and also for having helped me to improve the PhD thesis until the end.

I want also to address my best thanks to Doctor Giovanna Di Marzo Serugendo with whom I interacted almost exclusively about the main course of the ideas presented in this document. The numerous discussions, working sessions, sometimes laughs, that we have had during this work have more allowed me to ﬁnish this work than anything else.

I want also to thank Professor Luc Moreau for his feedbacks always ex- tremely detailed while building this document. I am sure that his ”sugges- tions” helped much in my eﬀorts.

I thank also Professor Bastien Chopard for his kindness and sympathy while I was working on this document.

Past and present members of the Object Systems Group have all, in their own way, modified my work over the years. Their comments and cheers have been most precious. I particularly thank Professor Jan Vitek who helped me to find a subject, Doctor Jean-Henry Morin for his valuable enthousi- asic advices as well as Chrislain Razafimaefa and Michel Pawlak my road companions.

Members of the Computer Science University Center (CUI) all helped to build this warm working surrounding that exists in Uni Dufour. It is impos-

v

(7)

vi

sible to thank them one by one, but I particularly remember my students, my colleagues on the course Computer Tools, and the other researchers form CUI.

Without many people from the outside university I would not have ﬁn- ished this work, that is why I whish to thank my dearest friends : Antoine, Carole, C´edric, Laurent, Marc, Marc, Marc, Michel, Mira, Nathalie, Thierry and Yan. Thank to them for being as they are. I also thank the ”gamers”

from the CUI who were with me during spare time, and more generally, I thank people from that populated my little world during those last years.

Finally, I want to thank my parents and my grandparents who always gave me their love and their conﬁdence all over my life and to whom I dedicate this document.

(8)

R´ esum´ e

De plus en plus d’applications ont besoin de fonctionner en continu. Ces applications peuvent être aussi bien des serveurs, réalisant un large éventail de tâches sur l’Internet, que des applications gérant des dispositifs mobiles comme des téléphones mobiles, des voitures, des satellites ou même des cen- trales nucléaires. De telles applications doivent pouvoir évoluer au cours du temps, quand des bogues sont découverts et corrigés, quand une fonction- nalité est ajoutée ou quand la topologie de leur environnement change. Le premier but de cette thèse est de montrer qu’il est possible de programmer les applications afin que leur évolution puisse être intégrée de manière trans- parente pendant l’exécution sans arrêter ni l’application qui évolue ou les applications qui l’utilisent.

Nous décrivons les systèmes dynamiques comme l’ensemble des applications qui peuvent évoluer à travers le temps, morceau par morceau. Cet ensemble inclut aussi bien les applications basées sur les composants, en par- ticulier les applications basées sur les agents, pour lesquels les partenaires lors des communications peuvent évoluer à travers le temps, que les systèmes pair-

`

a-pair (P2P) dont l’´evolution de la topologie est un comportement canonique.

Notre approche consiste en une modularisation des applications en plusieurs parties indépendantes qui peuvent être considérées comme des composants, des modules, des agents, des noeuds du réseau ; nous les désignons en util- isant le terme générique d’entité. Fondamentalement, les entités fournissent des services et peuvent requérir des services. Notre architecture repose sur les principe que les entités désireuses d’invoquer un service ne le nomment pas : elles décrivent ce qu’elles veulent. Les entités fournissant des services les décrivent et les annoncent. L’infrastructure de communication choisit le meilleur service à invoquer dynamiquement.

Notre approche se concentre sur la définition de : (a) un mini-langage pour décrire les services, qui est utilisé par le requérant et le fournisseur d’un service ; (b) une infrastructure de communication pour invoquer, annon- cer et retirer les services. Dans le mini-langage, nous définissons comment décrire les services pour disposer d’invocations associatives et anonymes.

vii

(9)

viii

Dans l’infrastructure générale, nous décrivons un mécanisme d’invocation asynchrone et intégrons les aspects anonymes et associatifs. Une invocation de service est associative car la description du service désiré est comparée

`

a celle du service proposé et le plus adapté des services proposés est ef- fectivement invoqué. L’invocation en elle-même est alors anonyme du fait qu’il n’est pas possible d’avoir de référence ou de nom décrivant l’entité qui effectue ou requiert un service. Les invocations sont asynchrones car les en- tités appelantes n’attendent pas la fin de l’invocation. Elles continuent leur exécution et éventuellement créent un service de retour pour obtenir une réponse.

La première partie de la thèse présente les travaux apparentés. La deuxiè- me partie définit notre modèle : les descriptions de service, les invocations et l’infrastructure de communication. Dans une troisième partie de la thèse, nous décrivons trois implantations de notre modèle : (a) une infrastructure centralisée qui permet une évolution dynamique d’applications locales et basées sur des composants ; et (b) deux infrastructures distribuées pour permettre à des applications de grande taille d’évoluer dynamiquement. La première implantation montre que nos principes structuraux sont adaptés pour faire évoluer dynamiquement des entités, consistant en des composants de l’application. La deuxième et la troisième implantations montrent que notre approche peut être utilisée avec succès pour gérer les invocations dis- tribuées dans des réseaux de systèmes dynamiques (les entités étant localisées sur les noeuds du réseau). Dans une quatrième partie de la thèse, nous montrons comment ces concepts peuvent être appliqués à d’autres domaines comme le pair-à-pair ou les services web. Finalement, nous évoquons les pistes de recherche que notre travail ouvre.

(10)

Abstract

An increasing number of applications need to run all the time. These applications may be servers that perform a wide range of tasks on the Internet as well as applications managing physical devices like mobile phones, cars, satellites or even nuclear power plants. Nevertheless, such applications need to evolve over time, when bugs are discovered and ﬁxed, when some functionality is added or when topology changes. The primary goal of this thesis is to show that it is possible to program applications such that their evolution may be seamlessly integrated at runtime without stopping either the application that evolves or other applications that use it.

We describe dynamic systems as the wide set of applications that may evolve over time part by part. This set includes component-based applications, and in particular agent-based applications, for which partners of communication may evolve over the time, as well as peer-to-peer (P2P) systems for which the topology evolution is a natural behavior. Our approach consists in modularizing applications and let them be cut in several independent parts that may be considered as components, modules, agents or network components; we designate them by the generic term of entities. Fundamen- tally entities provide services and may request services. Our architecture relies on the principle that entities, willing to invoke a service, do not name it: they describe what they want. Entities, providing services, describe and announce them. The communication infrastructure chooses the best service to invoke dynamically.

Our approach concentrates on deﬁning: (a) a mini-language for describing services, it is used both by the caller and the callee of a service; and (b) a general communication infrastructure for invoking, announcing and re- moving services. In the mini-language, we deﬁne how to describe services in order to have associative and anonymous invocations. In the general infrastructure, we describe an asynchronous invocation mechanism, and integrate the anonymous and associative aspects. A service invocation is associative, since the desired service description is compared to the proposed services, and the best adapted of the proposed services is actually invoked. The invo-

ix

(11)

x

cation itself is then anonymous as it is not possible to have any reference or name describing the entity that serves or requires a service. Invocations are asynchronous because calling entities do not wait for the invocation’s end.

They continue their execution and possibly create a return service to get an answer.

The first part of the thesis presents related work in this area. The second part of this thesis defines our model: service description, service invocation and the communication infrastructure. In a third part of this thesis, we describe three implementations of our model: (a) a central infrastructure that allows local component-based applications to evolve dynamically; and (b) two distributed infrastructure for allowing wide-scale applications to evolve dynamically. The first implementation shows that these infrastructure principles are adapted for making entities, being components of the application, evolve at runtime. The second and third implementation shows that our approach may be used successfully for managing distributed invocations in networked dynamic systems (entities being located at nodes of the network).

In a fourth part of the thesis we show how these concepts may be applied to other domains like Peer-to-Peer; or Web services. Finally, we evoke the research leads that our contributions open.

(12)

xi

A mes parents, `` A mes grands-parents.

To my parents, To my grandparents.

(13)

xii

(14)

List of Figures

3.1 General Architecture. . . 35

3.2 Entity . . . 36

3.3 Service Publication, Request, and Invocation . . . 40

3.4 Service Return . . . 42

3.5 Service Evolution . . . 47

3.6 Caller Evolution . . . 48

3.7 Service Description . . . 50

3.8 Tree Matching . . . 51

3.9 Terms Deﬁnition . . . 53

4.1 LuckyJ Platform: Service Publication . . . 62

4.2 LuckyJ Platform: Service Request and Invocation . . . 66

4.3 LuckyJ Platform: Service Return . . . 67

4.4 Entities Monitor. . . 72

4.5 Services Monitor. . . 73

4.6 Service Manager Monitor. . . 73

4.7 Text Editor Entities. . . 76

4.8 Scalability Benchmark: 500 services. . . 82

4.12 Service Description Benchmark: 250 services with length of 1000. . . 86

4.13 Service Description Benchmark: 50 services with length of 2500. 87 4.14 Evolution Benchmark: from 1 to 45000 entities. . . 88

5.1 Centralized Approach . . . 90

5.2 Semi-Centralized Approach . . . 91

5.3 Decentralized Approach . . . 93

5.4 Distributed LuckyJ: Service Publication . . . 95

5.5 Distributed LuckyJ: Service Request and Invocation . . . 96 xvii

(19)

xviii LIST OF FIGURES

5.6 Distributed LuckyJ: Service Evolution . . . 97

5.7 Semi-Centralized Distributed LuckyJ: Architecture . . . 99

5.8 Semi-Centralized Distributed LuckyJ: Invocation . . . 100

6.1 A General Interface for the Tic-Tac-Toe (Morpion in french). . 106

6.2 A Graphical Interface (called Flashy in ﬁgure 6.1). . . 107

6.3 A Graphical Interface (called Graphique in ﬁgure 6.1). . . 108

6.4 A Textual Interface (called Texte in ﬁgure 6.1). . . 108

6.5 The Evolving Web Server . . . 111

6.6 Stickers Application General Architecture. . . 119

6.7 A Sticker. . . 120

6.8 A PostItManager. . . 120

6.9 General Methodology. . . 126

6.10 Evolution Methodology. . . 127

6.11 Evolution Choice. . . 127

6.12 Pattern ﬁgures legend. . . 129

6.13 Return Pattern. . . 129

6.14 Factory Pattern. . . 130

6.15 Shared Variables Pattern. . . 131

6.16 Singleton Pattern. . . 131

6.17 Iterator Pattern. . . 132

6.18 Proxy Pattern. . . 133

6.19 Telephonist Pattern. . . 133

6.20 State Translation Pattern. . . 134

6.21 Entity Splitting Pattern. . . 135

6.22 Entity Uniﬁcation Pattern. . . 136

(20)

List of Tables

4.1 Comparative benchmark results. . . 79 4.2 Delta values. . . 81 6.1 Comparison between WeeselJ and FlashEd. . . 117

xix

(21)

xx LIST OF TABLES

(22)

Chapter 1 Introduction

Today, computation is everywhere. Programs make choices, serve, and help people doing almost every common task: they are used for managing cars, portable phones, computers, nuclear power plants, satellites and many other simple or complex devices. Many of them need to have nearly one hundred percent availability. A perfect program would provide such a quality of service but no program is perfect for two main reasons. First, programs are made by people. Perfection is therefore inaccessible as well as bug-free software. Thus, this buggy software needs to be corrected in order to remove, for example, security holes or make it work better. Second, initial requirements change over time as user needs evolve and advances of the technologies and knowledge in a particular ﬁeld change the way the software has been envi- sioned to work.

The main point we address in this thesis is how to allow software evolution at runtime, in a transparent way.

1.1 Unanticipated Dynamic Software Evolu- tion

In this section we give an insight on what is evolution of code, on what it consists of and on its taxonomy. The point is that it is pretty difficult to find clear definitions of the evolution paradigm for a simple reason: they seem to be very intuitive for any computer scientist as it is a problem that occurs at any coding/maintenance step in every day tasks. Nevertheless, these notions are often misunderstood. As an example, unanticipated software evolution may be one of the terms that needs to be defined by a negation (as we may find it on the description of the workshops on this subject):

”By deﬁnition, unanticipated software evolution (USE) is not something 1

(23)

2 CHAPTER 1. INTRODUCTION for which we can prepare during the design of a software system. Therefore, support for such evolution in programming languages, and component models and related runtime infrastructures becomes a key issue. Without it, unan- ticipated changes often force software engineers to perform extensive invasive modiﬁcation of existing designs and code.” [91].

In this section we intend to provide clear deﬁnitions for the following terms: software evolution, marshaled/unmarshaled software evolution, dynamic/static software evolution, anticipated/unanticipated software evolution.

1.1.1 Software Evolution

Research on software evolution is the ﬁeld of research that is interested in ﬁnding rules, identifying patterns, that govern changes made by programmers or maintainers of an application.

This means that research on software evolution is related to the changes that lead to minimal inconsistencies, as well as identifying how programmers do make their code evolve over time. It constitutes a large field that may include automated evolution of programs (like genetic algorithms [64] for instance) as well as reverse engineering techniques [20]. As stated previously, this vague and large-scoped definition is very intuitive for most programmers and maintainers. The usual trend of comprehension is that modifying a software is complex and implies a totally new cycle of development with new testing and production phases, which is always risky in terms of resource consumed. This is especially true for applications that have not been developed by the people that modify them. Our feeling is that evolution of code may be eased by programming conventions or validated through the use of particular techniques (e.g. subtyping, reuse contracts [158], DVM [101]). In such a case we talk about marshaled software evolution. The problem by considering such approaches is that it generally restrains a lot the nature of changes that may be effected. In contrast,unmarshaled software evolution is interested in having as wide possibilities of evolution as possible (e.g. modifying the code arbitrarily). This means that any arbitrary change may be effected. Unmarshaled evolution is what we address in this work.

1.1.2 Dynamic Software Evolution

Two main ﬁelds of software evolution may be deﬁned, orthogonally to marshaled/unmarshaled evolution: static and dynamic evolutions.

Static software evolution: consists in evolving the code of an application

(24)

1.1. UNANTICIPATED DYNAMIC SOFTWARE EVOLUTION 3 while it is stopped. The advantage is that there is no question of state transfer or active thread to solve. The problem is that stopping an application means stopping services it provides and thus temporary unavailability.

Dynamic software evolution: consists in evolving an application during its execution, without stopping it. The advantage is that there is no unavailability. The problem is that technical issues are still very un- certain.

As described by Menset al.[107], there is a large set of properties related to software evolution and, in particular, numerous properties concern the times when changes may be made. In particular, changes may typically occur at three diﬀerent time points:

Compile-time changes: these changes occur when programmers recompile their source code. Handling such changes consists in having mechanisms that ease the task of modifying source code. Typically, method- ologies related to typed languages (hence languages themselves) help programmers in this task.

Load-time changes: these changes occur when an execution platform is able to load code and, possibly modify it at runtime. This type of changes is typically given by dynamically loadable libraries [144] and the Java system ClassLoader mechanism [67].

Runtime Changes: these changes may occur anytime during the execution of the program. This type of change is still intensely investigated in the research area and no clear answer has been given to deﬁne a well accepted solution as it will be discussed in chapter 2.

Compile-time changes are generally identiﬁed as astatic evolution mechanism while runtime changes correspond to adynamic evolution mechanism.

Load-time changes are a less well-deﬁned mechanism. Load-time may be considered a static evolution mechanism as well as a dynamic evolution mechanism (when an unloading mechanism is available), depending on how it is used by platform/application builders. In fact, when used to load classes and code on demand, it acts as a static evolution mechanism because it may reﬂect extensions of the classes while the actual set of loaded code is still unchanged. Another possibility is to add a layer that allows to load code dynamically at runtime and to unload older versions. In this case, the code may be evolved through successive versions dynamically: it acts as a dynamic evolution mechanism.

(25)

4 CHAPTER 1. INTRODUCTION Run-time changes mean that parts of the code and objects may be modi- ﬁed during execution (a typical dynamic evolution mechanism). This means that strategies have to be found in order to solve a large variety of problems, like transferring states of objects and transfering states of running methods.

In this dissertation, we are interested in unmarshaled dynamic software evolution which means that we want to make an application evolve during its execution without restraining the evolution possibilitities.

1.1.3 Unanticipated Software Evolution

As we showed, evolution can be marshaled/unmarshaled and may occur at different time points. Aditionally, evolution can be anticipated or unanticipated. This constitutes a third ortogonal axis. We define the difference betweenanticipated and unanticipated as following:

Anticipated Evolution: is an evolution that has been foreseen by the programmer. As an example, plug-ins technologies (like Java servlets [84], Adobe Photoshop plugins [123], or object oriented language inheritance) are mechanisms that allow programmers and maintainers to extend functionalities, but not modify the heart of the application itself. The advantage is that it is possible to provide mechanisms that rely on simple loading/unloading primitives in the case of dynamic software evolution or on providing API in the case of static evolution. The problem is that it lacks ﬂexibility in possible changes.

Unanticipated Evolution: unanticipated evolution consists in evolution that has not been foreseen by the programmer. Changes have thus to be supported either by the language or the execution platform (in the case of dynamic evolution). The advantage is that a wide variety of changes is available. The problem is that there is still no widely accepted solution for such a platform.

In this work we chose to treat the problem of unmarshalled dynamic unanticipated software evolution in object-based languages. This is done by providing platform-level mechanisms for letting programs evolve at runtime.

The great challenge in doing so is the development infrastructure that allows smooth dynamic changes. Thus, the challenge is to change only the parts that need to be evolved and still let the others interact with either old versions of the code, or newer ones. In the following section we explain how we realize such a task and what are the design choices we made while realizing it.

(26)

1.2. TOWARD DISCONNECTION 5

1.2 Toward Disconnection

Our work concentrated on object-oriented languages. These languages are designed to provide a maximum ﬂexibility by providing mechanisms to allow the programmer to reuse most of the code previously written, thus making a whole system evolve with a minimal eﬀort. In such languages, static changes may already lead to errors. As an example method capture as shown by Steyaert et al. [158] may lead to an unexpected behavior and inconsistent state.

However, there are many other shortcomings for object-oriented languages when trying to provide dynamic and unanticipated evolution. The main ones relies on the fact that in running programs parts of the program are linked by references to the data and code structures associated, according to their respective roles. This concept is implemented in traditional programming languages either by pointers in languages such as C, C++ or by references in languages such as Java, Eiﬀel and scripting languages.

Concretely, the connection eﬀect arises in a few cases that we identify here. We say that a piece of code is connected to another in the following cases:

• When a class inherits from another.

• When an object is an instance of a class.

• When there is a direct reference from an object to another.

• When there is a blocking method call from one class to another.

• When there are synchronization constraints between diﬀerent pieces of code.

This is a stronger version of the isolation notion that is presented in the ClassLoader abstraction [67] and isolates [138, 30]. In this thesis, we advocate that it is possible to develop applications without connections between their main parts.

1.3 A Guided Tour to the Disconnected Ar- chitecture

Disconnection is a major focus in our work and the disconnected architecture we evoke in this section is an attempt to provide a way for programmers

(27)

6 CHAPTER 1. INTRODUCTION to build disconnected applications. It is based on the diﬀerent concepts of anonymity, associative naming and asynchrony. We call the basic build- ing blocks of an application entities; they may communicate only through services. A communication infrastructure handles services and services invo- cations.

The core of an entity consists of an instance of a specialization of a base class entity. An entity is an aggregate of this instance along with its class, classes and instances it uses. An entity constitutes a protection domain and a name-space with which no other entity may interfere using methods or references. Thus, entities are anonymous.

Each entity announces services that it provides to the communication infrastructure. A service announcement consists of an abstract description that states: functionality, inputs/outputs and, quality of services.

An entity may request a service to be invoked by providing a description of its needs to the communication infrastructure, namely the service manager. The service manager then chooses among the most suitable services, according to their respective descriptions, and invokes the service.

The choice is thus made anonymously using associative naming, i.e. the entity to contact for service invocation is chosen using the description. The invocation process itself is asynchronous and the only synchronous part lies in returning a tag referencing communications. This tag may then be used by entities to establish a long-lived communication through the caller and the callee of the service.

This disconnected architecture was implemented in three distinct cases:

• a local implementation that allows to make an application evolve at runtime by replacing entities and transferring the state between old and new versions.

• a distributed implementation that (1) makes the components of an application distributed on a network evolve, (2) allows to reuse entities that had primarily been written for the local implementation, (3) is centralized though distributed (the communication infrastructure centralizes all services on a single platform althoug eﬀective invocations are distributed on the whole network), (4) allows to have a running local implementation (running on the local implementation) be distributed at runtime.

• a distributed peer-to-peer implementation that (1) makes the components of an application distributed on a network evolve, (2) allows to reuse entities that had primarily been written for local implementation,

(28)

1.4. CONTRIBUTIONS 7 (3) is semi-centralized though distributed (the communication infrastructure centralizes all services on a a set of given platforms althoug eﬀective invocations are distributed on the whole network).

1.4 Contributions

This thesis is concerned with unmarshalled dynamic unanticipated evolution of object-oriented applications whether they may be distributed or not. The intention is to have a framework that allows seamless evolution of applications. The main contributions are the following:

• A framework for unanticipated dynamic software evolution that iden- tiﬁes the connection as a major refraining factor for seamless dynamic evolution.

• A communication model between components that retains the following properties: it is disconnected,anonymous and asynchronous. This communication model shows that it is possible to conceive an infrastructure holding such properties and that it helps dynamic evolution.

• Three implementations of this model: a local one, a centralized distributed one, and a peer-to-peer one. These implementations show that the model we built is applicable at several levels: intra-application evolution, distribution of applications on-the-ﬂy, and evolution in a peer- to-peer network.

• Experiences reports on these implementations that show, in particular, a 99.99% availability on a web server built on top of our architecture.

During a period of one year and a half, more than 160 diﬀerent versions of some parts of the code ran, 5 stops were made to the server,mostly for external reasons. We also present two other applications that show how we can distribute an application over the network and that inexpe- rienced programmers may program easily with our platform. All these experiments allowed us to build a non-deﬁnitive group of programming patterns.

1.5 Thesis Overview

The rest of the thesis is organized as follows. In chapter 2, we look at the related works. In chapter 3, we detail the model we built to support dynamic software evolution. In chapter 4, we discuss the local implementation. In

(29)

8 CHAPTER 1. INTRODUCTION chapter 5, we present the two distributed implementations. In chapter 6, we show diverse examples of applications we built on top of our implementations. We also show some programming patterns that we could extract from these experiences. In chapter 7, we give an idea of other ﬁelds in which the produced models and implementations could be useful. Finally we conclude in chapter 8.

(30)

Chapter 2 State of the Art

In this chapter, we present different works that participate to the discussion on software evolution. As stated in chapter 1, software evolution is the field of computer science interested in effecting changes to software. To study software evolution, we prefer to use the different types of changes rather than the duality static/dynamic evolution which is rather confusing. Our main supporting argument is that similar techniques of changes may be used to achieve different types of evolution.

As deﬁned in chapter 1 there are mainly three types of changes:

• Compile-time changes.

• Load-time changes.

• Runtime changes.

Note that we consider these changes in their standard use ﬁrst but, provide an idea of the limitations implied when trying to use them in an unanticipated dynamic software evolution.

In the following paragraphs, we detail each of these possible changes in a diﬀerent section and conclude with overviews on other techniques that one may use to achieve the task of realizing an evolution platform.

2.1 Compile-Time Changes

In this section we give an overview of the diverse techniques used for providing compile-time changes. In ﬁrst part, we describe what type of changes provides the subtyping. In second part, we explain what is possible in the diverse object technologies that arose since the time objects have emerged.

Finally, we give an overview of the techniques used in the reengineering ﬁeld.

9

(31)

10 CHAPTER 2. STATE OF THE ART

2.1.1 Subtyping and Subclassing

In the object-oriented paradigm, it is a common fact that subtyping is a very good mechanism for evolution and reuse of code [87, 113, 7, 100]. Actually, it may only be used to extend programs. Subtyping a class means, schemati- cally, to keep the same parameters and methods of parent class and possibly add new ones (may add new parameters and methods, but must keep the ones of the previous class).

Simula-67 [32] (referred as the first object-oriented language) uses this mechanism to extend functionalities through classes’ specialization. In Smalltalk- 80 [62, 63, 111] and Java [67], each class is a subclass of another, except Object that is at the top of the generated inheritance tree. In Eiffel [113, 112], the inheritance mechanism supports multiple inheritance. This means that conflicts may arise between two method names that have been defined twice in the upper trees. For the extension mechanism to work correctly, there are two different constructs that help the method lookup to work correctly: redefining (overriding) and renaming methods. Redefining helps the programmer to provide a new implementation for a method. Renaming a method helps the programmer to differentiate two methods that hold the same names and parameters, but provide significantly different functionalities. C++ [162, 164] also lets programmers have multiple inheritance and several articles have discussed the way to provide a good inheritance mechanism [155, 163]. The commonly accepted solution at the moment, is to reference inherited methods using the originating class name as a prefix for the inherited method when called.

Looking at all these mechanisms, the main principle behind subtyping and subclassing is that a subclass acts as an extension of the older class. It may then provide new functionalities through methods adjunctions. Over- riding (redefining a method in a subclass), combined to polymorphism, acts as the main evolution mechanism. In fact, to build new versions of parts of an application, the most simple mechanism is to build a subclass of a class previously used in the same role, then to override methods that need to be changed, and finally use polymorphism to call the newly created feature in place of the old one. Nevertheless, this mechanism implies that the typing constraints are verified between newer and older versions of the same class.

But this is not the common case. In many situations of unanticipated evolution [158], a large number of changes may be needed, like class splitting or uniﬁcation of classes. These changes consist in splitting an older version of a class into two new classes or merge two old classes into one in the new version of the program. These changes are not supported by classical inheritance mechanisms.

(32)

2.1. COMPILE-TIME CHANGES 11 While realizing static evolution, type inconsistency across versions is a problem that may easily be solved. In fact, types may be adapted as needed to satisfy type consistency. As an example, adding an abstract method in a parent class is straightforward (even if fastidious) as the compilers indicate that this method should be concrete. Nevertheless, there are cases where it may be very annoying. As an example, when using libraries that are not alterable (the programmer of the ﬁnal application is not the coder of the library), keeping changes type-safe may simply not be possible and thus the application does not compile/work. In a dynamic evolution situation, modifying the type of running objects is still a real pain as there is no approach at the moment that proposes solutions for class splitting or uniﬁcation (see section 2.3).

2.1.2 Controlling Object-Oriented Evolution.

In the object world, a certain number of works consist in providing techniques that control evolution at compile-time. The reason for such approaches is that at compile-time any arbitrary change may be made. This means that to achieve correctness while reusing previously written code, the conceptors have to understand the reused code itself. This is somehow in contradiction with the fact that reusing lines of code should not imply to make the eﬀort of rewriting it or reunderstanding it in details. In fact, in potentially large projects it may be impossible to understand all the code for a single programmer. Thus, techniques have had to be developed to handle the inherent complexity of reusing (possibly modifying) already existing code.

In the ﬁeld of object modeling, methods are often concerned by the evolution of object behavior over the time [146]. This means that the object changes of behavior is modeled through state machines [148]. It is often expected that all modeling problems are solved at design-time and the maintenance of the application is rarely taken into account. Only a few works [43, 88, 105] consider evolution of code and models. One of the leads followed by researchers is reuse contracts [158] (explained in details in next paragraph). Others are very sparse and it seems that evolution of modeling has been left to maintenance [95, 104]. In this ﬁeld studies mainly concentrate on building metrics [98, 29] to analyze the impact of the maintenance process in terms of time, or study how to change already existing code (see next subsection for details on reengineering techniques). In the following, we look at both tendencies we outlined: contracts and aspects.

Contracts. Contracts constitute a research path to control evolution and to allow code to be reused/modiﬁed with maximum correctness. As an intro-

(33)

12 CHAPTER 2. STATE OF THE ART ductory example, the Eiffel language [113, 112] uses assertions through preconditions and post-conditions in the definition of methods and the invariant checks on classes/instances. Preconditions give the runtime environment as well as programmers a mean to verify that they correctly use a method with the possibility to both check at runtime and at compile-time the conditions it contains. As an example, they may contain tests on input variables, object variables or helpful comments. Postconditions define properties that may be verified by the programmer at compile-time and by the runtime environment after the invocation of the method finished. Those two mechanisms are primarily intended to provide a flexible notation for raising exceptions and for helping programmers in the design of the application. Finally, invariant check is also used to verify that some conditions are verified anytime on the class and its instances. The mechanisms present in Eiffel are very striking as soon as a programmer has had to reuse some previously written code. Java programmers’ habit is to check anytime in classes’s API what are the possible errors and misuse. Eiffel programmers’ habit is to check in the assertions what are the possible errors due to misunderstanding of the class use.

At a class hierarchy level, the idea behind contracts is that modularizing an application and delegating the responsibility of verifying modules interactions to the module programmers themselves is a way to ensure re- quired properties and correctness of execution [135]. More precisely, reuse contracts intend to detect at compile-time the methods (re-)definitions that may interact in an unwanted way. In reuse contracts [158] and their extensions [109, 108, 106, 105] the delegation is made through the use of an API-like formal definition of the methods declared and used in a class. It contains methods that may be abstract or concrete and, in the case they are concrete other methods that are called by the concrete method. This allows programmers and designers of a new version of the application to be informed in 4 cases:(1) when a method may enter in conflict because a parent’s class is defined as a previously defined method in a child class, (2) when a parent class adds an abstract method that is not implemented in children, (3) when a method is used more frequently than before, (4) when a parent’s method is less frequently invoked than before. This means that reuse contracts help programmers and designers of an application’s new versions to understand that a new version of the code captures some features that were not originally captured. This may be the desired feature in the new version, but, in the general case, the situations evoked above will occur when a developer or a designer modifies an important interaction (he was not aware of) hidden by the general mechanisms of inheritence, polymorphism and encapsulation.

In Eiﬀel as well as in reuse contracts, we can see that there is a will to provide a greater control in the reuse case. This means that implementors

(34)

2.1. COMPILE-TIME CHANGES 13 and designers must be aware that their changes may interact in a poor way with the rest of the application because of the general interconnection of classes. In both cases, the deﬁnition of what classes expect to be valid and what they do are included in the class by designers and/or programmers.

In both cases, the check for consistency is made later (at compile-time or run-time).

Aspects. Aspect-Oriented Programming (AOP) [90, 177, 97] is one of the recent concepts that generated greatest interest in the software engineering and software programming research community. It benefits from theoreti- cal frameworks [110, 52] and several implementations [89, 78]. The main idea behind AOP is that it is possible to write programs by parts that may be recomposed (woven) at compile-time to produce the application. These parts correspond to identified domain/application-specific behaviors (namely aspects) of the future produced system. In this context, the main property that lets aspects be composed to produce the application is the property that aspects do not ”cross-cut” each other. This means, following our terminol- ogy, that there is a decoupling between aspects: they do not have to be coded together.

AOP is primarily designed as a way to manage evolution in a controlled way. Nevertheless, a large number of questions are still open in this point of view. In given examples of AOP developments, aspects may clearly be identified because they have few connections. As an example in a chat application, the network layer (sending and receiving messages) may clearly be differentiated from the treatment part of an application (GUI). Nevertheless when considering this in an evolution perspective, the chat application may also be reworked to let users share files and consult/exchange them while chatting. In this case, the network layer should probably also be able to handle the file sharing ability as well as the GUI. The problem is that entry points on the code may have evolved and the weaving be much more complex. In our opinion, AOP is a technique that works particularly well while the design process has been cleanly made. As soon as the application changes substancially, the design has to be thought again; and this may imply great modifications in the way aspects are weaved.

2.1.3 Reengineering Techniques.

In this subsection we have a look at the main way in which long-running programs evolve at compile-time: through reengineering techniques. At the moment, there is a large basis of operational software that is not any more understood by the people owning it and this for a very simple reason: initial

(35)

14 CHAPTER 2. STATE OF THE ART developers are often not in the organization owning the software and the software itself is often not well documented. This constitutes what is called legacy systems and according to Ian Sommerville [156] in 1990, ”it was es- timated that there were 120 billions lines of code in existence”. This implies that it is not possible to only consider that we may program using a new methodology and that it would be suﬃcient for evolving software. Thus, reengineering techniques have to be found in order to make such applications evolve.

In literature [104], in what we call here reengineering techniques, there is a smooth diﬀerence made between inverse engineering, reverse engineering and re-engineering.

Inverse engineering is best explained as the activity of taking legacy code and transforming it in an intermediary form (dependant of the inverse engineering platform) common to all supported languages [179, 178], ﬁnally transform the produced code by applying a set of transformations that build clearer/better code.

Reverse engineering is the activity of taking a source code and reuse it as much as possible in the production of a new one [154]. In the recent years, this term has been widely used in the open source community because open source programmers typically need to reuse some closed source code in order to build new drivers for open source platforms (typically Linux [175]). For this example they proceed through decompilation of given drivers and then reuse the obtained code in order to build the new driver (e.g. sonypi [137]).

Re-engineering (also known as refactoring) is the activity consisting to modify the source code through small transformations [114] that should change unstructured code into structured one without correcting design ﬂaws.

In recent years this term has adopted a larger meaning to include inverse engineering. The original meaning is now named refactoring [33].

Refactoring an application is probably the most interesting method for evolution as it actually consists of applying changes on the code. Refactoring an application is only the ﬁrst step to allow the evolution of its behavior although it does not modify the behavior itself. The goal behind refactoring is actually to modify the code to transform it in a more understandable code. This step may be needed when correcting bugs and developing new functionalities. Nevertheless, it does not address the evolution problems in its entirety.

(36)

2.2. LOAD-TIME CHANGES 15

2.2 Load-Time Changes

In this section we present a large panel of solutions to provide load-time changes. They usually consist of techniques that allow to load libraries/code repositories at runtime and usually unload them. In the ﬁrst part, we detail the dynamic loading evolution. In the second part, we explain the late- binding mechanism and show how this contributes to evolution.

2.2.1 Dynamic Loading.

As C++ [44] is a compiled language, it has been a real challenge to dynamically add code and use it, since it is difficult to make a call in a compiled language without a specified reflexive module to handle the evolution behavior. Stroustrup writes [164]: “Early experiments integrating C++ and dynamic linking were promising so I had expected dynamic linking to be common years ago.” (p. 206). As we will see in this subsection only few advances addressed the problem for C++ but some features, along with modules dynamic loading coming from the Lisp community, inspired the design of more modern languages (like Java [67, 96, 99]) . In the following of this subsection we mainly explain the different approaches of dynamic loading, firstly related to the Lisp community, then to C++.

In the following, we show the mechanisms devoted to evolution through the use of version numbers.

Basic Mechanisms In earliest works, the Lisp [49, 57, 140] and Scheme [74, 28] community was interested in developing module systems that would allow one to share modules along the diﬀerent programs. Such a viewpoint led researchers to focus on loading and compiling on demand. While this constitutes a dynamic loading mechanism, it lacks the main feature that let consider evolution rather than incremental loading: and namely the possibility to unload modules.

In early works [40], Dorward, Sethi and Shopiro propose a system to dynamically add code to a running C++ program. Their system relies on linking lately the code of derived classes from already present classes in the program, imposing to the newly loaded class to have the same signature as its parent. They concentrate on keeping type-safety and have a portable system. The programmer then use a preprocessor to introduce the code and keep a portable C++ code. Evolution is not only in the dynamic loading of C++ classes but also in the fact that programmers may “promote” objects to a subclass (transform an instance of a parent class into an instance of a subclass) as it is “a safe direction” regarding type safety (although appearance

(37)

16 CHAPTER 2. STATE OF THE ART of new potentially uninitialized variables may not be taken into account).

Palay describes ∆C++ [134] which consists in a special compiler for unmodified C++ programs that allows compiled programs to dynamically load libraries that evolved without recompiling them. The supported (“compatible”) modifications are four: member-extension (adding new methods and fields to a class), class-extension (adding a new base class to a class), member-promotion(moving a method or a field to a super-class) and override- changing (adding and overriding method in a subclass). In general, ∆C++

allows programmers to modify dynamic libraries if methods have the same signature and if the modiﬁcation is type-safe. Technically, authors use a linker that is invoked at startup and then are able to handle “compatible”

changes in the diﬀerent libraries at startup. It is not possible to modify classes during runtime and thus evolution is limited to extending libraries between program startup and providing a mechanism that does not imply recompilation if changes are made to dynamically linked libraries.

The actual C (and then C++) dynamic linking library for Unix and Linux is dlopen [144]. It allows the dynamic loading of libraries and their use through manipulation primitives. This system let users load and unload shared libraries and thus make them evolve during execution. The main problem is that programmers need to have a strong knowledge of these possibilities and of the libraries involved in this process as the available primitives do not allow to get the complete list of the functions loaded from the library. Another problem that may arise when willing to use a library that evolves comes from the fact that it is not possible to force a library to be reloaded. In eﬀect, the dynamic package maintains a counter on the library and unloads it when it drops to 0 (and then it may be reread from the disk).

Other languages like Smalltalk/X [24] and Objective C [5] dispose of similar constructs through the use, respectively, of fileIn and NSBundle.

Version Aware Loading Goldstein and Sloane describe [65] a way to support dynamically reloadable classes in C++ to address the problem of eﬀective evolution in shared libraries. According to the authors, maintaining compatibility between versions of a library is problematic and distributed objects technologies complicate library implementations showing the need to have multiple versions of the same class linked into the same program. For achieving this, authors manipulate the linkage mechanisms and this allows to call diﬀerent classes with the same calls depending on the versions of the dynamic libraries they use at this time. Only classes that may evolve are written as dynamically relinkable. This work intends to address transpar- ently problems raised by the fact that objects of a newer version may be

(38)

2.2. LOAD-TIME CHANGES 17 passed to a program that basically wanted to use an older version. Thus, it asks the dynamic library to define the older version regardless if the class have had already been defined. The system they propose needs a particular compiler and thus is highly platform dependent. As passing new objects to old versions do not pose many problems due to the algorithm building data structures (principally they append data and forbid to change order of previously defined methods and fields), passing older objects to newer versions of the code is not likely to work.

Hamilton and Radia describe the Spring distributed environment [73] in which they consider the problems linked to interfaces evolution. In the distributed framework they consider it is impossible to update the whole system at once. Different versions of interfaces may then coexist in the system and may be linked to corresponding programs when needed. Authors separate the versions changes in major and minor revisions. If changes made to an interface imply that this interface is now unusable by old clients, it is amajor revision, otherwise it is a minor revision. Interfaces are described using the OMG’s IDL[37] and support multiple inheritance. The different versions of an interface are classes that represent objects on which it is possible to apply diverse methods. Each interface has a base type that corresponds to just defining the name of the class. Major revisions are subclasses of this base type andminor revisions are successive subclasses ofmajor revisions. Doing so means that compatibility of newer minor revisions is ensured because it extends previous ones and thus it is not necessary to recompile client programs. When newmajor revisions get into the system, both versions coexist as long as needed. Thus, clients do not need to be stopped for recompiling as long as they do not use the newest major revision. However, clients need to be modified and recompiled if they use a newer major revision. The main problems with this approach are the following : the administrator has to be strongly involved with users to determine when he may effectively remove a major revision and its minor revisions ; another problem is that if an interface changes and if that interface is used in another interface, it will be the old version that is used and there is no possibility but to recompile and adapt new code to the new interface. This work is only an extra infrastructure for realizing transparent evolution and multi-versionning of code but gives a primary infrastructure to think of: transparent evolution is done using subclassing to make small updates, major updates need reengineering of the old code and thus, it is not suitable to make major updates.

Hjálmtýsson and Gray [80] describe a way to provide dynamic loading of C++ classes. Mainly, mechanisms used are similar to those proposed by Dorward et al. [40] but possibilities given differ in two ways: they allow replacement of previously loaded classes and they allow multiple versions of

(39)

18 CHAPTER 2. STATE OF THE ART the same class to coexist in the program. Actually it consists in building an indirection level between client classes and the dynamic classes using a template class that allows the creation of objects of the class desired. An object is then accessible directly through its handler and may be used as if it had been defined inside the program. The first limitation of this approach is that dynamic libraries may be only loaded once by name and thus it means that a new version needs to have a new name to be effectively loaded.

Another limitation is that an inheritance tree needs to be completely present in the dynamic library in order to be taken into account, due to the template class encapsulating the dynamic version of the class. The last limitation that is not even identiﬁed in [80] is correctness: as there may be many diﬀerent versions of the same class in the program, it is also possible to pass references from an old version to the new one. This means that it requires programmers to be aware that their classes will be versioned.

2.2.2 Dynamic Loading as a Late Binding Mechanism

In Java 2 [67] the dynamic loading is integrated in the language and consists in having instances of the class ClassLoader that load classes. This is due to the fact that at an early stage of development of the language, applets were a primary concern for the language builders. Thus, several possibilities may be used to retrieve a class. In their work [96], Liang and Bracha describe the dynamic class loading principles for Java 2. In Java 2, different classes may subclassClassLoaderand in particular SecureClassLoader from the API or NetworkClassLoader (example presented in the API). A ClassLoader instance defines a name space where only one class may have a given name. Loaders are hierarchically organized and it is possible to delegate charging a class to another ClassLoader. Instanciating explicitly a class in a ClassLoader reserves the use of the name of the class and lets no other class be referenced using this name in thisClassLoader. A possibility for having multiple versions of a class usable in a program is to delegate the loading of the different versions in different class loaders and, with the reflexive core package of Java 2, to instantiate them and apply methods to the defined classes (see [96] for an example). This is, at this point, the only way to build programs that evolve in Java 2. Although dynamic evolution seems to be one of the goals of the class loading mechanism presented by Liang and Bracha [96], it still lacks simplicity. In fact, to be used it needs to have an infrastructure built around theClassLoaderconcept, effecting state transfer of objects and allowing the use multiple classes’ versions.

C# is another language that allows dynamic loading and late binding.

As identiﬁed by Drossopoulou et al. [42] there are very similar features for

An approach to the dynamic evolution of software systems

Thesis

Reference

An approach to the dynamic evolution of software systems

An Approach to the Dynamic Evolution of Software Systems

TH ` ESE

Remerciements

Acknowledgments

R´ esum´ e

Abstract

Contents

List of Figures

List of Tables

Chapter 1 Introduction

1.1 Unanticipated Dynamic Software Evolu- tion

1.1.1 Software Evolution

1.1.2 Dynamic Software Evolution

1.1.3 Unanticipated Software Evolution

1.2 Toward Disconnection

1.3 A Guided Tour to the Disconnected Ar- chitecture

1.4 Contributions

1.5 Thesis Overview

Chapter 2

State of the Art

2.1 Compile-Time Changes

2.1.1 Subtyping and Subclassing

2.1.2 Controlling Object-Oriented Evolution.

2.1.3 Reengineering Techniques.

2.2 Load-Time Changes

2.2.1 Dynamic Loading.

2.2.2 Dynamic Loading as a Late Binding Mechanism