HAL Id: inria-00000136
https://hal.inria.fr/inria-00000136
Submitted on 24 Jun 2005
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires
Network Communications in Grid Computing: At a Crossroads Between Parallel and Distributed Worlds
Alexandre Denis, Christian Pérez, Thierry Priol
To cite this version:
Alexandre Denis, Christian Pérez, Thierry Priol. Network Communications in Grid Computing: At
a Crossroads Between Parallel and Distributed Worlds. 18th International Parallel and Distributed
Processing Symposium (IPDPS 2004), Apr 2004, Santa Fe/USA, United States. pp.95a. �inria-
00000136�
Network Communications in Grid Computing:
At a Crossroads Between Parallel and Distributed Worlds
Alexandre Denis
1Christian Pérez
2Thierry Priol
21
IRISA/IFSIC,
2IRISA/INRIA
Campus de Beaulieu — F-35042 Rennes Cedex — France
Alexandre.Denis@irisa.fr, Christian.Perez@irisa.fr, Thierry.Priol@irisa.fr
Abstract
This paper studies a communication model that aims at extending the scope of computational grids by allowing the execution of parallel and/or distributed applications with- out imposing any programming constraints or the use of a particular communication layer. Such model leads to the design of a communication framework for grids which al- lows the use of the appropriate middleware for the applica- tion rather than the one dictated by the available resources.
Such a framework is able to handle any communication middleware —even several at the same time— on any kind of networking technologies. Our proposed dual-abstraction (parallel and distributed) model is organized into three lay- ers: arbitration, abstraction and personalities which are highlighted in the paper. The performance obtained with PadicoTM, our available open source implementation of the proposed framework, show that such functionality can be obtained with still providing very high performance.
1. Introduction
The emergence of computational grids as new high- performance computing infrastructures gives the users ac- cess to computing resources at an unprecedented scale in the history of computing. However, computational grids differ from previous computing infrastructures since they exhibit both parallel and distributed aspects: a computational grid is a set of various and widely distributed computing resources, which are often parallel, ranging from high-performance supercomputers to clusters of P
Cs. As a consequence, a grid usually contains various networking technologies — from S
ANin a room through W
ANat a continent scale.
This work was supported by the Incentive Concerted Action “GRID”
(ACI GRID) of the French Ministry of Research.
Ideally, when applications are deployed on grid re- sources, they should adapt themselves to their environment, and to the networks in particular. The current program- ming practices associated with computational grids were strongly influenced by such an adaptation capability. A common programming approach is to see the grid as a vir- tual parallel computer, so that programmers can follow the usual techniques of parallel programming, for example with M
PI. Since M
PIis available on a large number of network- ing technologies, applications based on this communication middleware will be able to adapt to the networking environ- ment. Such an adaptation is performed at the application programming interface level. However, adaptation is also required at runtime. For example, an application linked with a M
PIlibrary configured to used the G
Mdriver of a Myrinet network restricts the application deployment to systems that provide such a network.
However, providing a single communication model (message-based), will not be enough for most applications, because it does not take into account any other communica- tions such as visualization, steering, coupling of simulation codes, or interactive control. Therefore, in addition to a par- allel middleware system such as M
PI, at least another mid- dleware system is required to handle these new kinds of in- teraction. Such a middleware system should be distributed- oriented to handle dynamic connection/disconnection.
The first contribution of this paper is to propose a com- munication framework that decouple application middle- ware systems from the actual networking environment.
Hence, applications become able to transparently and ef-
ficiently utilize any kind of communication middleware (ei-
ther parallel or distributed-based) on any network that they
are deployed on, removing thus the aforementioned deploy-
ment constraints. As a second contribution of this paper,
the proposed model is able to concurrently support sev-
eral communication middleware systems with very few or
no change. Such capability is very important when using
modern programming practices such as distributed compo- nent programming for the design of HPC applications. In- deed, distributed component models, such as CCA [2] or GridCCM [20], require a communication middleware for communication between components; if the code inside the components is parallel, then a communication middle- ware is used inside the components. As a consequence, these modern programming practices need to middleware systemes, one for intra-component communications and another for inter-components communications. We have shown in [10] that even for standard networking technolo- gies such as Ethernet with T
CP/I
P, sharing such a network interface between two middleware systems raises some se- rious technical concerns.
The remainder of this paper is divided as follows. Sec- tion 2 presents an analysis of grid communication based on some examples of typical grid usage. In section 3, we pro- pose a communication framework model that supports both parallelism and distributed computing. Section 4 describes and evaluates the implementation of this model in the Padi- coTM platform. Section 6 presents some related works. Fi- nally, we conclude in Section 7.
2. Grid Communication Model Analysis
This section introduces some important features we think that actual and forthcoming grid-enabled applications will require. Then, it defines the communication paradigms and analyzes communication abstraction so as to draw the main directions for a communication framework for grids.
2.1. Grid Network Use Analysis
A grid application can be deployed on different resource configurations. For instance, one deployment configuration may be a set of nodes within a single P
Ccluster equipped with a high-performance network, while another deploy- ment configuration may be a set of nodes in two separate P
Cclusters interconnected through a high-bandwidth W
AN. Another example of grid use is given by parallel compo- nent based applications [2, 20] where a component em- beds a parallel code. The component framework uses its own paradigm to interconnect components. This paradigm should be independent from the communication paradigms used internally by parallel components. Hence, a M
PI- based component could be connected to a P
VM-based com- ponent.
A last example is a grid application which supports con- nection and disconnection from the user to visualize and/or monitor the ongoing computation. Hence, the grid appli- cation is likely to use at least two middleware systems:
one or more for the computation and another for visualiza- tion/monitoring.
These scenarios introduce some important features which should be supported by grid-enabled middleware sys- tems:
Transparency — The middleware systems used by an ap- plication should be able to transparently and efficiently use the available resources. For example, a M
PI, P
VM, Java or C
ORBAcommunication should be able to utilize high speed networks (S
AN) as well as local area networks (L
AN) and wide area networks (W
AN).
Moreover, they should adapt their security require- ments to the characteristics of the underlying network, eg. if the network is secure, it is useless to cipher data.
Flexibility — There is a diversity of middleware systems, and we can assume there will always be. It seems im- portant not to tie grid applications to a specific grid framework but instead to ease the “gridification” of middleware systems.
Interoperability — Grids are not a closed world. Grid applications will need to be accessible using standard protocols. So, there is a high need to keep protocol interoperability.
Support Multiple Communication Paradigms — Some programming models like parallel components (C
CA[2], GridCCM [20]), or situations like a S
OAP- based monitoring system of a M
PIapplication, require several middleware systems. Thus, it is important to allow different middleware systems to be used simultaneously.
2.2. Communication Paradigm Analysis
If we define a communication paradigm as a family of middleware systems which are built on the same model, we can distinguish two important kinds of communica- tion paradigms: the parallel paradigm and the distributed paradigm.
Parallel paradigm — The main aspect of parallelism is, without any doubt, high performance. Communica- tions take place inside a definite and usually static set of nodes known by each other (mostly S
PMD- oriented), messages have well-defined boundaries, the A
PIis optimized for zero-copy implementations, there are collective operations which involves several nodes of the set. A typical example is M
PI. We can dis- tinguish distributed-memory parallelism and shared- memory parallelism; in this network-centric paper, we focus on distributed-memory parallelism.
Distributed paradigm — The main constraint is interop-
erability. Connections are dynamic, managed on a
per-link basis in a client/server way; interoperabil- ity is brought across architectures, operating systems and software vendors; communication primitives may use streaming. Some typical examples are T
CP/I
P, C
ORBAor S
OAP.
These are our definitions and will be used in the remain- ing of this paper. They should be understood as a clas- sification with soft boundaries, not as absolute rules; for example M
PI2 allows dynamic connections, a D
SMsys- tem (Distributed Shared Memory) is not message-based, but we still consider them as parallel. In this paper, we consider T
CP/I
Pand U
DP, C
ORBA-I
IOP[18], S
OAP[4], H
LA-R
TI[15] and Java R
MIas distributed-oriented; M
PI, P
VM, D
SM, FastMessage, Madeleine [3] or Panda [22] as parallel-oriented.
2.3. Abstraction Level Analysis
The last step of our analysis deals with the different lev- els of communication abstraction found in a grid applica- tion.
The resource abstraction principle consists in the def- inition of an abstract interface which is not bound to any particular implementation. There may exist several incar- nations which implements the same abstract interface. Ab- straction is a widely used mechanism to cope with the dif- ferences between various kinds of networks; in this case, it is called a portability mechanism. When an abstract inter- face for portability is designed to be used by several mid- dleware systems and/or applications (and not only for the portability of one middleware system), it is called a gener- icity mechanism. This results in a stack of software layers whose abstraction level increases down-top:
System-level — implemented by a network driver such as G
M, B
IP[21], V
IA, Sisci [14] or other vendor- supplied communication library, or by the operating system such as T
CP/I
P.
Generic-level — implemented by a communication frame- work, such as Madeleine [3], Nexus [12] or Panda [22].
The A
PI, independent from the network, is likely to be used by a middleware system.
Application-level — implemented by a middleware sys- tem, such as C
ORBA, M
PI, P
VMor H
LA-R
TI. It im- plements a programming model. The A
PIis designed to be used by applications.
3. A Model for Grid Communication Frame- works
This section presents our proposed model of a commu- nication framework for grids that takes into account both
parallel and distributed paradigms.
3.1. Abstraction Model Study
The commonly used abstraction model brings portabil- ity: the ability for a middleware system to utilize several kinds of networks, according to what is available. It brings also genericity: the ability to reuse the portability software infrastructure for several middleware systems. However, genericity is usually brought by the definition of a unique abstract interface. This choice of a unique abstract inter- face is especially relevant for portability, but is question- able regarding genericity: this approach is generic inside a particular paradigm —the paradigm chosen for the abstract interface.
Which abstract interface? Since we want our communi- cation framework to ba able to support middleware systems based on both paradigms, we want to find an abstract in- terface able to be used by both kinds of middleware. We may think of using a unique distributed-oriented abstract interface. Indeed, a lot of parallel middleware can uti- lize T
CP/I
Psockets that are a distributed abstract interface.
This approach is well adapted for making a parallel sys- tem look like a distributed network infrastructure (eg. Ether- net). However, it seems irrelevant to use a parallel-oriented network, such as the internal network of a supercomputer or a cluster. As depicted in Figure 1 (a), the use of a sin- gle abstract interface imposes unnecessary compromises, in particular when running a parallel application on a parallel machine! In this case, for example, an M
PIimplementa- tion built atop T
CP/I
Pis able to run on most networking re- sources, including supercomputers networks, but is unable to utilize “parallel-specific” properties of these networks, such as optimized collective operations. This is due to the lack of expressiveness of the distributed-oriented T
CP/I
PA
PI.
Symmetrically, it is quite common to use a unique par- allel interface on grids, for example M
PICH-G2 [11]. It is possible to use it to implement distributed-oriented commu- nication mechanisms, such as distributed objects. However, the parallel-oriented M
PIinterface cannot express proper- ties which are essential for distributed computing, such as I
Paddressing, dynamic connections in a client/server fash- ion (not spawn as in M
PI-2), or interoperability with other standard implementations. For instance, it seems impossi- ble to build a standard-conforming C
ORBAimplementation on top of M
PICH-G2 (or more precisely, M
PICH’s abstract interface called “A
DI-2”) alone.
In both cases, a unique abstract interface biased towards
only parallelism or distributed computing penalizes the
middleware systems from the other paradigm since some
network Distributed network
Parallel Parallel middleware
Distributed middleware
Distributed abstraction
middleware
Parallel Distributed middleware
Unified abstraction
Parallel
network network
Distributed
Middleware
Abstract interfaces
Networks
Cross−paradigm Straight
(a) Left side: everything expressed through a single abstraction (distributed) — two cross-paradigm translations are needed for a parallel middleware atop a parallel network.
(b) Right side: a unified abstraction makes compromises in all cases! — gives up most possible optimizations and imposes compromises to everything.
Parallel network Distributed network Parallel abstraction Distributed abstraction
Distributed middleware Parallel middleware
Networks Abstract Interfaces Middleware
Cross−paradigm Straight
(c) Dual-abstraction model: use different abstract interfaces for different paradigms
— only required compromises are done.
Figure 1. Several abstraction models may be envisaged.
properties available at system-level cannot be expressed by the abstract interface, so they are lost.
Avoid the “bottleneck of features”. Thus we should try to find a better abstract interface which would combine both properties from parallelism and distributed computing, as depicted in Figure 1 (b); this abstraction would “keep the best of both worlds”. In order to take into account the interoperability constraint from distributed computing, a unified abstract interface cannot be far from a distributed- oriented interface. More generally, it seems unrealistic to weaken the strong constraints of distributed computing to make them more look like the weaker hypothesis of par- allelism which allow some optimizations: giving up the streaming capability from distributed computing in order to optimize a message-based communication system (à la M
PIor Madeleine) breaks the required interoperability with T
CP/I
P; using topology and hardware configuration infor- mations to optimize collective operations seems incompat- ible with the per-link connection management and interop- erability with standard plain I
Pfrom the distributed side.
A unified abstract interface cannot give up the strong con- straints required by the distributed side, thus it uselessly im- poses these strong constraints even to the parallel side. A single abstract interface, be it distributed, parallel, or uni- fied, does not seems satisfactory.
Rather than trying to unify contrary things, we pro- pose a dual-abstraction interface, with both a parallel- and distributed-oriented interface. Each middleware system is either parallel or distributed not both at the same time. For
example C
ORBA, H
LAand S
OAPare distributed when M
PIand P
VMneed a parallel abstract interface. There is no need to find an interface which would be both; it is suffi- cient to provide each middleware system with the appropri- ate abstract interface, and to supply each abstract interface on both kinds of networks. This dual-abstraction approach is depicted in Figure 1 (c). Each middleware system uti- lizes the required abstracted interface. Each abstract inter- face is instantiated on each network through an adapter: an adapter may be either straight, or cross-paradigm. Con- sequently, compromises for cross-paradigm translation are performed only when they are required. With such a dual- abstraction model, there always exists an abstract-level in- terface able to express the properties for each kind of hard- ware. Bending all system-level interfaces towards a unique abstraction does not seem appropriate because it loses some key features: a communication framework for grids cannot be parallel- nor distributed-only. We chose to build our grid communication framework on this dual-abstraction model.
3.2. Resource Virtualization for Seamless Swapping of Communication Methods
The middleware systems likely to be used by grid-
enabled applications are various: M
PI, C
ORBA, S
OAP,
H
LA, J
VM, P
VM, etc. Moreover, for each kind of mid-
dleware, there are several implementations which have their
own specific properties. Developing a middleware system is
a heavy task —for example, M
PICHcontains 200,000 lines
of C— and requires very specific skills. Moreover, the stan-
dards —and thus, the middleware systems themselves— are ever-changing. It does not seem reasonable to re-develop an implementation of each one of these middleware systems specifically for a given communication framework. Instead, we chose to re-use existing implementations. Thus it is easy to follow the new versions and to use specific features of a given implementation.
To seamlessly re-use existing implementations of mid- dleware systems, we choose to virtualize networking re- sources. It consists in giving the middleware system the illusion that it is using the usual resource it knows, even if the real underlying resource is completely different. For example, we show a “socket” A
PIto a C
ORBAimplementa- tion so as to make it believe it is using T
CP/I
P, even if it is actually using another protocol/network behind the scene.
This is performed through the use of thin wrappers on top of the appropriate abstract interface to make it look like the required A
PI. We call these small wrappers personalities.
It is possible to give several personalities to an abstract in- terface.
Virtualization and abstraction mechanisms with cross- paradigm adapters allows any middleware system to seam- lessly utilize any network. However, even if a straight adapter is available, it is not always the better method, espe- cially on distributed-oriented networks. The other methods are for example:
Parallel streams on W
AN— Over a high-bandwidth high-latency W
ANwith T
CP/I
P, each single packet loss can dramatically lower the bandwidth. A solu- tion consists in utilizing multiple sockets in parallel for a single logical link, so as to reduce the influence of each isolated loss. This principle of parallel streams is already used for example in GridFTP [1].
Online compression — On slow networks, it may be worth compressing data to speed-up the transfers.
AdOC [16] implements an adaptive online compres- sion mechanism.
Encryption and authentication — When a connection lays between two different sites, it is likely that the user wants authentication and/or encryption. This may be achieved through the use of a protocol plug-in. It raises a whole set of new problems, such as certificate management and credential delegation. We investigate the use of the Grid Security Infrastructure (G
SI) [13]
or I
Psec.
Loss-tolerant protocol — On slow W
ANwhich suffer from high loss-rate, applications may prefer to give up reliability against a better bandwidth, but not accept totally uncontrollable losses. Such a tunable tradeoff is implemented in V
RP[6], a protocol with a tunable loss tolerance.
These various communication methods may be supplied as alternate adapters beside straight and cross-paradigm adapters. They must exhibit the right abstract interface according to their respective paradigm. Their use is thus seamless from the point of view of the middleware systems.
Thanks to these virtualization mechanisms, the hardware re- sources do not curb the programming model to be used in applications. The possible deployments schemes are more advanced than just parallel applications on a parallel ma- chine or distributed applications on a distributed system.
Each middleware system is able to use every available re- sources —parallel and distributed— with the most appro- priate method — eg. C
ORBAas well as M
PIare able to effi- ciently use Myrinet if available, or use W
AN-specific meth- ods if necessary. The virtualization enables the use of a communication paradigm not dictated by the hardware.
3.3. A Hybrid Parallel + Distributed Model
In this section, we propose a model of communication framework for grids, based on a 3-layer approach, with both parallel- and distributed-oriented abstract interfaces.
An implementation of this model is depicted in Figure 2.
Our proposed dual-abstraction model is organized in 3 lay- ers: arbitration, abstraction, and personalities. Parallel and distributed paradigms are present at each level. Therefore, cross-paradigm translation is performed only when required (ie. distributed middleware atop parallel hardware or paral- lel middleware atop distributed networks) with no bottle- neck of features.
Arbitration layer. Concurrent access to network hard- ware by multiple middleware systems at the same time is not straightforward. There is a high risk of access conflicts.
We propose that arbitration should be dealt for at the low- est possible level, so as to build more advanced abstractions atop a fully reentrant system. Arbitration is performed by a layer which provides a consistent, reentrant and multiplexed access to every networking resources, each resource is uti- lized with the most appropriate driver and method. The arbitrated interfaces are designed for efficiency and reen- trance. Thus, we propose these A
PIto be callback-based (à la Active Message). For true arbitration, this layer is the only client of the system-level resources: all accesses to the network should be performed through the arbitration layer.
It provides also arbitration between different networks (eg.
Myrinet against Ethernet) so that they do not bother each
other, and between different adapters (as defined in Sec-
tion 3.1) on the same network (eg. both C
ORBAand M
PIon
Myrinet) even if the communication library does not pro-
vide multiplexing. More details about cooperative access
rather than competitive are given in [9].
NetAccess MadIO Interface
NetAccess SysIO
Circuit VLink
Interface
Madeleine System Sockets VIO AIO
Posix Sockets
BSD FM Madeleine
Personalities
Adapters
Arbitration
Abstract interfaces Standard interfaces
Arbitrated interfaces
Figure 2. Implementation of the model in PadicoTM.
Abstraction Layer. On top of the arbitration layer, we propose an abstraction layer which provides higher level services, independent from the hardware. Its goal is to pro- vide abstract interfaces well suited for their use by various middleware systems. The abstract layer should be fully transparent: the interfaces are the same whatever the un- derlying network is. The abstraction layer supplies both parallel- and distributed-oriented abstract interfaces on top of every method from the arbitration layer, through mod- ules called adapters. This layer is responsible for automati- cally and dynamically choosing the best available interface from the arbitration layer according to the available hard- ware; then it should map it onto the right abstract interface through the right adapter. As shown on Figure 2, adapters may be straight (same paradigm at system- and abstract- level, eg. parallel abstract interface on parallel hardware) or cross-paradigm — eg. distributed abstract interface on par- allel hardware.
Personalities. In order to provide virtualized communi- cation A
PI, we propose a personality layer able to supply various standard A
PIs on top of the abstract interfaces. Per- sonalities are thin wrappers which adapt a generic A
PIto make it look like another A
PI. They do no protocol adapta- tion nor paradigm translation; they only adapt the syntax.
4. Implementation of the Communication Model
Padico [7] is our software infrastructure for Grid Com- puting. The communication model described in the previ- ous section has been implemented in the high-performance runtime system of Padico called PadicoTM [9, 10] as de- picted in Figure 2. The PadicoTM framework is used for parallel C
ORBAobjects [8] and components [20]. This pa- per focuses only on the novel communication model pro-
posed in PadicoTM. However, PadicoTM addresses other issues for integrating middleware systems, such as dy- namic code loading and configuration, arbitration for multi- threading, memory management and Unix signals. These other issues are purposely not discussed in this paper.
4.1. Network Access Arbitration: NetAccess
The arbitration layer in PadicoTM is called NetAccess, which contains two subsystems: SysIO for access to sys- tem I/O (sockets, files), and MadIO for multiplexed access to high-performance networks. A core handles a consistent interleaving among the concurrent polling loops. NetAccess is open enough so as to allow the integration of other sub- systems beside MadIO and SysIO for other paradigms such as Shmem on S
MPfor example.
NetAccess MadIO: A
PIfor Accessing Parallel-oriented Hardware. For good I/O reactivity and portability over high performance networks, we have chosen the high- performance network library Madeleine [3] as a foundation.
Madeleine is used for high-performance networks such as Myrinet, S
CI, V
IA. Madeleine provides no more multi- plexing channels than what is allowed by the hardware (eg.
2 over Myrinet, 1 over S
CI). MadIO adds a logical mul-
tiplexing/demultiplexing facility which allows an arbitrary
number of communication channels. Multiplexing on top of
Madeleine adds a header to all messages. This can signifi-
cantly increase the latency if not done properly. We imple-
ment headers combining to aggregate headers from several
layers into a single packet. Thus, multiplexing on top of
Madeleine adds virtually no overhead to middleware sys-
tems which send headers anyway. We actually measure that
the overhead of MadIO over plain Madeleine is less than
0.1
s which is imperceptible on most current networks.
NetAccess SysIO: A
PIfor Accessing Distributed- oriented Hardware. Contrary to a widespread belief, us- ing directly the socket A
PIfrom the O
Sdoes not bring full reentrance, multiplexing and cooperation. Several middle- ware systems not designed to work together may get into troubles when used simultaneously, even with only plain T
CP/I
P. There are reentrance issues for signal-driven I/O (used by middleware systems designed to deal with heavy load), which results in an incorrect behavior, or worst, in a crash. If a middleware system uses blocking I/O and an- other uses active polling, the one which does active polling holds near 100 % of the C
PUtime; it will result in inequity or even deadlock. To solve these conflicts, SysIO manages a unique receipt loop that scans the opened sockets and calls user-registered callback functions when a socket is ready.
The callback-basedness guarantees that there is no reen- trance issue nor signals to mangle with.
NetAccess core. The core of NetAccess manages the threads with the polling loops. It enforces fairness between SysIO and MadIO. The interleaving policy between SysIO and MadIO is dynamically user-tunable through a configu- ration A
PIto give more priority to system sockets or high performance network depending on the application.
4.2. Abstractions: VLink and Circuit
The abstract interfaces in PadicoTM are called VLink for distributed computing, and Circuit for parallelism.
Distributed abstract interface: VLink. The VLink in- terface is designed for distributed computing. It is client/server-oriented, supports dynamic connections, and streaming. In order to easily allow several personalities — both synchronous and asynchronous personalities—, VLink is based on a flexible asynchronous A
PI. This A
PIconsists in five primitive operations —read, write, connect, accept, close. These functions are asynchronous:
when they are invoked, they initiate (post) the operation and may return before completion. Their completion may be tested by polling the VLink descriptor; a handler may be set which will be called upon operation completion. Such a set of functions is called a VLink-driver. VLink drivers have been implemented on top of: MadIO, SysIO, Parallel Streams for W
AN, AdOC [16], loopback.
Abstract interface for parallelism: Circuit. The Cir- cuit interface is designed for parallelism. It manages com- munications on a definite set of nodes called a group. A group may be an arbitrary set of nodes, eg. a cluster, a sub- set of a cluster, may span across multiple clusters or even multiple sites. Circuit allows communications from ev- ery node to very other node through an interface optimized
for parallel runtimes: it uses incremental packing with ex- plicit semantics to allow on-the-fly packet reordering, like in Madeleine [3]. Collective operations in Circuit still needs to be investigated. Circuit adapters have been implemented on top of MadIO, SysIO, loopback and VLink (to use the alternates VLink adapters); a given instance of Circuit can use different adapters for different links.
Selector. VLink and Circuit automatically choose which protocol to use according to a knowledge base of the net- work topology managed by PadicoTM and user-defined preferences. All protocols are available for both VLink and Circuit interfaces.
4.3. Personalities and Middleware Systems
PadicoTM provides several well-known A
PIthrough simple “cosmetics” adapters over the VLink and Circuit ab- stract interfaces. These thin A
PIwrappers are called per- sonalities. The personalities for VLink are: Vio for an explicit use through a socket-like A
PI; SysWrap supplies a 100 % socket-compliant A
PIthrough wrapping at link stage for direct use within C, C++ or F
ORTRANlegacy codes without even recompiling. Thus, legacy applications are able to transparently use all PadicoTM communication methods without losing interoperability with PadicoTM- unaware applications on plain sockets. We implement an Aio personality on top of VLink which provides a plain Posix.2 Asynchronous I/O (Aio) A
PI. Thin adapters on top of Circuit provides a FastMessage 2.0 A
PI, and a (virtual) Madeleine A
PI.
Thanks to SysWrap, various middleware systems have been seamlessly ported on PadicoTM with no change in their code: C
ORBAimplementations (omniORB 3, om- niORB 4, ORBacus 4.0, all Mico 2.3.x including C
CM- enabled versions), an H
LAimplementation (Certi from the Onera), and a S
OAPimplementation (gSOAP 2.2). A Java virtual machine (Kaffe 1.0.7) has been slightly modified for use within PadicoTM, with some changes in its multi- threading management code. Thanks to the Madeleine personality, the existing M
PICH/Madeleine implementation can run in PadicoTM. The middleware systems are dynam- ically loadable into PadicoTM. Arbitration guarantees that any combination of them may be used at the same time.
5. Performance Evaluation
Our test platform is comprised of dual-Pentium III
1 GHz with 512 MB RAM, switched Ethernet-100,
Myrinet-2000 and Linux 2.2. The raw bandwidth of var-
ious middleware systems in PadicoTM over Myrinet-2000
is depicted in Figure 3. The maximum bandwidth and
0 50 100 150 200 250
32 1KB 32KB 1MB
Bandwidth (MB/s)
Message size (bytes) omniORB-3.0.2/Myrinet-2000 omniORB-4.0.0/Myrinet-2000 Mico-2.3.7/Myrinet-2000 ORBacus-4.0.5/Myrinet-2000 MPICH-1.1.2/Myrinet-2000 Java socket/Myrinet-2000 TCP/Ethernet-100 (reference)