DISTRIBUTED NETWORK SYSTEMS

(1)

(2)

DISTRIBUTED NETWORK SYSTEMS

(3)

Managing Editors:

Ding-Zhu Du

University of Minnesota, U.S.A.

Cauligi Raghavendra

University of Southern Califorina, U.S.A.

Network Theory and Applications

Volume 15

(4)

DISTRIBUTED NETWORK SYSTEMS

From Concepts to Implementations

by

WEIJIA JIA

City University of Hong Kong, P.R. China WANLEI ZHOU

Deakin University, Australia

Springer

(5)

0-387-23840-9 Print ISBN: 0-387-23839-5

No part of this eBook may be reproduced or transmitted in any form or by any means, electronic, mechanical, recording, or otherwise, without written consent from the Publisher

Created in the United States of America Boston

Visit Springer's eBookstore at: http://ebooks.springerlink.com and the Springer Global Website Online at: http://www.springeronline.com

(6)

have evolved from traditional “LAN” based distributed systems towards “Internet based” systems. Although there exist many excellent textbooks on this topic, because of the fast development of distributed systems and network programming/protocols, we have difficulty in finding an appropriate textbook for the course of “distributed systems” with orientation to the requirement of the undergraduate level study for today’s distributed technology. Specifically, from up- to-date concepts, algorithms, and models to implementations for both distributed system designs and application programming.

Thus the philosophy behind this book is to integrate the concepts, algorithm designs and implementations of distributed systems based on network programming. After using several materials of other textbooks and research books, we found that many texts treat the distributed systems with separation of concepts, algorithm design and network programming and it is very difficult for students to map the concepts of distributed systems to the algorithm design, prototyping and implementations.

This book intends to enable readers, especially postgraduates and senior undergraduate level, to study up-to-date concepts, algorithms and network programming skills for building modern distributed systems. It enables students not only to master the concepts of distributed network system but also to readily use the material introduced into implementation practices.

The book takes an integrated approach to view the distributed system as a set of programming blocks cooperating on distributed sites. The primary objective of the concept, design and implementation is to meet the requirements or distributed applications based on the networking environment. In this book, networking and distribution design for applications are represented in the form of several dimensions. Therefore, the book describes the distributed systems along a line from general distributed system requirement of applications to system transparency that reflect system structure and algorithm designs and implementation techniques.

The striking features of the book, differs from others, can be illustrated from two basic aspects:

(1) The viewpoint of applications, i.e., what kinds of concepts and programming skill are fitted for the design of distributed systems and applications.

(2) The viewpoint of system designer and implementers, i.e., the system layers and their mapping to the design of distributed algorithms and their implementations.

The book not only provides the basic distributed systems and networks protocols (such as RPC, group communication and Mobile IP), but it also presents the discussion of recent technology development for Internet such as IP for next generation (IPv6 and multicast and anycast communication). As Web/Java

(19)

technology is getting important and popular nowadays, this book illustrates how a distributed system and network protocols can be designed and implemented with distributed system concepts and network programming in today’s Internet environment.

The book is composed of 15 chapters. Most chapters contain substantial materials about concepts, algorithm designs and implementation techniques. The outline of the book is given below.

Chapter 1. Overview of Distributed Systems: This chapter outlines the basic concepts of distributed systems and computer networks, such as their purposes, characteristics, advantages, and limitations, as well as their basic architectures, networking and applications.

Chapter 2 introduces the client-server model and its role in the development of distributed network systems. The chapter discusses the cooperation between clients and servers/group servers in distributed network systems, and addresses extensions to the client-server model. Service discovery, which is of crucial importance for achieving transparency in distributed network systems, is also elaborated in this chapter.

Chapter 3. Communication is an important issue in distributed computing systems:

This chapter addresses the communication paradigm of distributed network systems, i.e., issues about how to build the communication model for these systems.

Chapter 4. Internetworking. Network software is arranged in a hierarchy of layers:

Each layer presents an interface to the layers above it that extends the properties of the underlying communication system. Network functions are achieved through the layered protocols. This chapter discusses the communication protocols in a network, especially, TCP/IP protocols used on the current Internet. The next generation of Internet protocol – IPv6 is also addressed in the chapter.

Chapter 5. Interprocess Communication using Message-Passing: Processes in a distributed network system normally do not share common memory. Therefore, message-passing is one of the effective communication mechanisms between these processes. In this chapter we discuss the most commonly used message-passing based interprocess communication mechanism, i.e., the socket API.

Chapter 6. TCP/UDP Communication in Java: In this chapter we want to address the TCP/UDP programming in Java, since the Java language is currently the most commonly used language to implement a distributed computing system. Java provides the reliable stream-based communication for TCP as well as the unreliable datagram communication for UDP.

Chapter 7. Interprocess Communication using RPC: When using message-passing for interprocess communications, a programmer is aware of the passing of messages between the two processes. However, in a remote procedure call situation, passing of messages is invisible to the programmer. Instead, a language-level concept, the procedure call, is used to mask the actual communication between two processes. In this chapter we discuss two commonly used RPC tools, the DCE/RPC and the SUN/RPC. We have developed a RPC tool, called the Simple RPC tool, which will be described in the chapter. The idea of RPC has been extended to develop

(20)

interprocess communication mechanisms for object-oriented paradigm, notably the Remote Method Invocation (RMI) in Java. We also introduce this mechanism in the chapter.

Chapter 8. Group Communications is highly desirable for maintaining a consistent state in distributed systems. Many existing protocols are quite expensive and of limited benefit for distributed systems in terms of efficiency. This chapter describes concepts and design techniques of group communication protocol including message ordering, dynamic assessment of membership and fault tolerance. The protocol ensures total ordering of messages and atomicity of delivery in the presence of communication failures and site failures, and guarantees that all operational members belonging to the same group observe a consistent view of ordered events.

The dynamic membership and failure recovery algorithms can handle site failures and recovery; group partitions and merges; dynamic members join and leave.

Chapter 9. Reliability and Replication Techniques: A computer system, or a distributed system consists of many hardware/software components that are likely to fail eventually. In many cases, such failures may have disastrous results. With the ever-increasing dependency being placed on distributed systems, the number of users requiring fault tolerance is likely to increase. The design and understanding of fault-tolerant distributed systems is a very difficult task. We have to deal with not only all the complex problems of distributed systems when all the components are well, but also the more complex problems when some of the components fail. This chapter introduces the basic concepts and techniques that relate to fault-tolerant computing.

Chapter 10. Security: There is a pervasive need for measures to guarantee the privacy, integrity and availability of resources in distributed network systems.

Designers of secure distributed systems must cope with exposed service interfaces and insecure networks in an environment where attackers are likely to have knowledge of the algorithms used and to deploy computing resources. In this chapter we talk about security issues of distributed network systems, such as integrity mechanisms and encryption techniques, and in particular, the techniques for defense against Distributed Denial-of-Service attacks.

Chapter 11. A Reactive System Architecture for Fault-Tolerant Computing: Most fault-tolerant application programs cannot cope with constant changes in their environments and user requirements because they embed fault-tolerant computing policies and mechanisms together so that if policies or mechanisms are changed the whole programs have to be changed. This chapter presents a reactive system approach to overcoming this limitation. The reactive system concepts are an attractive paradigm for system design, development and maintenance because it separates policies from mechanisms. In the chapter we propose a generic reactive system architecture and use group communication primitives to model it. We then implement it as a generic package, which can be applied in any distributed applications. The system performance shows that it can be used in a distributed environment effectively.

Chapter 12. Web-Based Databases: World Wide Web has changed the way we do business and research. It also brings a lot of challenges, such as infinite contents, resource diversity, and maintenance and update of contents. Web-based database

(21)

(WBDB) is one of the answers to these challenges. In this chapter, we classify WBDB architectures into three types: two-tier architecture, three-tier architecture, and hybrid architectures, according to WBDB access methods. Then the existing technologies used in WBDB are introduced as various generations, i.e. the traditional Web (generation 1), fast and more interactive Web (generation 2), Java- based Web (generation 3), and a new generation combining the techniques of XML and mobile agents. Based on the introduction, we provide the challenges and some solutions for current WBDB. Finally we outline a future framework of WBDB.

Chapter 13. Mobile Computing: Mobile computing requires wireless communication, mobility and portability. In the past few years, we have seen an explosion of mobile devices over the world such as notebook, multimedia PDA and mobile phones. The rapidly expanding markets of cellular voice and limited data service have created a great demand for mobile communication and computing.

Mobile communications applications include mobile computing and wireless communications. Many of the advances in communications involve the use of Internet Protocol (IP), Asynchronous Transfer Mode (ATM), and ad hoc network protocols. Recently much focus has been directed at advancing communication technology in the area of mobile wireless networks especially on the IP based wireless networks. This chapter focuses on two major issues: Mobile IP and mobile multicast / anycast applications.

Chapter 14. Distributed Network Systems: Case Studies. In the previous chapters we have discussed various aspects of distributed network systems. Distributed network systems are now used everywhere, especially on the Internet. In this chapter we study several well-known distributed network systems, as the examples of our discussion.

Chapter 15. Distributed Network Systems: Current Development. This last chapter outlines the most recent development in distributed network systems. In particular, we present four “hot” topics that have attracted a lot of attention from both academia and industry. These topics include: cluster computing, grid computing, peer-to-peer computing, and pervasive computing. For each topic, we try to outline its current development, its potential applications and benefits, and its challenges. The purpose of this chapter is to broaden the reader’s knowledge in distributed network systems.

The book is suitable to any one who needs a informative introduction, basic design and programming strategies of distributed systems and applications. It serves as an idea textbook of one-semester course for senior undergraduates and post-graduates.

Chapters 1-6 serve as the basis for the distributed system design and network programming. There are diverse objectives for using the book: (1) For learning of distributed operating system design and implementations: Chapters 7, 8, 9, 10, and 14 can serve the purpose. (2) For readers who are interested in the design and implementations of web-based databases and Internet computing, Chapters 7, 8, 12 and 15 can be used. (3) To learn the concepts of fault-tolerant distributed system design, Chapters 8,9, 11 will serve the purpose. (4) For understanding group, RPC communication protocols and Mobile IP, Chapters 8, 10, 13 will help.

(22)

Acknowledgements

We are grateful to many classes students at City University of Hong Kong and Deakin University who have made a lot of feedbacks to our teaching materials as their comments inspire us to write this book. Inspirations also come from Dingzhu Du, Wei Zhao, Qing Li and Andrzej Goscinski.

The following people gave their time to help us to formulate the book, especially, Changgui Chen, who helped to edit the book and contributed partially to Chapter 13.

Yang Xiang contributed partially to Chapter 10; Mingjun Lan contributed partially to Chapter 12; and John Casey contributed partially to Section 15.2. Pui-On Au and Yujia Wang helped to format the final version of the book.

We would like to acknowledge some support from research grants we have received, in particular, CityU Strategic grant nos. 7001587/7001446 and UGC grant nos.

CityU 1055/01E and CityU 1076/00E, the Austalian Research Council Small Grant no. 0504-32409-0132-3501 and the Deakin University Research Grant 0504-23434- 3101. Although the research grants are not directly used to support the writing of the book, some interesting research results presented in the book are taken from our research papers which indeed (partially) supported through these grants. We also would like to express our appreciations to the editors in Kluwer Academic Publishers, especially John Martindale and Angela Quilici, for their excellent professional support.

Finally we are grateful to the family of each of us for their consistent and persistent supports. Weijia would like to present the book to XieMei and Sally. Wanlei would like to present the book to Ling, Lingdi and Andi. Without their support, the book may just become some unpublished discussions.

Weijia Wanlei 1-May-04

(23)

(24)

Biography of Authors

Dr. Weijia Jia is an Associate Professor in Department of Computer Science and Department of Computer Engineering and Information Tech., City University of Hong Kong. He received his BSc and MSc in Computer Science from Center South University (CSU), Changsha, China in 1982 and 1984, respectively. He joined the, CSUT as an Assistant Lecturer in 1984. From 1987 to 1988, as a guest researcher he worked at the Department of Computer Science, University of Ottawa, Canada.

From 1988 to 1991, he was a Lecturer in Department of Computer Science, CSU. In 1993, he received his PhD in Computer Science from Faculty Polytechnic of Mons, Belgium and joined German National Research Center for Information Technology (GMD) in St. Augustin as a research fellow. In 1995 he joined the Department of Computer Science, City University of Hong Kong as an assistant professor. His research interest includes computer network and systems with emphasis on parallel/distributed object group system, communication protocols, real-time and Internet communications. He has published extensively in these fields, especially the field of Anycast routing and applications. He is a member of IEEE, IEEE Communication Society and IEEE Computer Society.

Dr. Wanlei Zhou is a Chair Professor and Head of School of Information Technology, Deakin University, Melbourne, Australia. Dr. Zhou received the B.Eng and M.Eng degrees from Harbin Institute of Technology, Harbin, China in 1982 and 1984, respectively, and the PhD degree from The Australian National University, Canberra, Australia, in 1991. Before joining Deakin University, Dr. Zhou has been a Lecturer in Chengdu Institute of Radio Engineering (University of Electronic Science and Technology of China), China, a programmer in Apollo/HP at Massachusetts, U.S.A., a Lecturer in National University of Singapore, Singapore, and a Lecturer in Monash University, Melbourne, Australia. His research interests include distributed computing, computer networks, IT security, performance evaluation, and fault-tolerant computing, and he has published extensively in these research areas. Dr. Zhou is a member of the IEEE and IEEE Computer Society, and the ACM.

(25)

(26)

Table of Figures

Figure 2.1. The basic client-server model Figure 2.2. Printing service (a service example) Figure 2.3. Indirect client-server cooperation Figure 2.4. Examples of three-tier configurations

Figure 2.5. An example implementation of the three-tier architecture Figure 2.6. Service discovery -- broadcast approach

Figure 2.7. Service discovery -- name server and server location lookup Figure 2.8. A distributed computing system architecture

Figure 3.1. CORBA CDR message

Figure 3.2. Time diagram of the execution of message-passing primitives Figure 3.3. Send primitives: (a) blocking; (b) non-blocking

Figure 3.4. Blocked send primitive

Figure 3.5. Unbuffered and buffered message passing Figure 3.6. Message passing; (a) unreliable; (b) reliable

Figure 3.7. Message-passing semantics. (a) at-least-once; (b) exactly-once Figure 3.8. An RPC example: a read call

Figure 3.9. Group structures

Figure 4.1. The layered protocol model Figure 4.2. The OSI reference model

Figure 4.3. Comparison of Internet and OSI architectures Figure 4.4. The Layered TCP/IP protocol suite

Figure 5.1. The distributed application model Figure 5.2. BSD interprocess sockets Figure 5.3. File and socket descriptors Figure 5.4. Socket model

Figure 7.1. DCE architecture Figure 7.2. Build a DEC Application

Figure 7.3. Using threads in a client-server application Figure 7.4. The CDS directory hierarchy

Figure 7.5. Components of the DCE directory service Figure 7.6. Time synchronisation using intervals

Figure 7.7. Time synchronisation within a multi-LAN cell Figure 7.8. Interactions between DFS components Figure 7.9. The RMI architecture

Figure 8.1. Causal ordering rule (Group G={S1, S2, S3}) Figure 8.2. Causal ordering and total ordering

Figure 8.3. Reliable multicast system architecture Figure 8.4. A group comprises of n+1 sites Figure 8.5. Groups

Figure 8.6. A group of n members form a logical token ring Figure 8.7. Logical token ring structure and normal operations Figure 8.8. Message transmission example

Figure 8.9. Dynamic membership Figure 8.10. System structure Figure 8.11. RMP hierarchy structure Figure 8.12. Packet for logical ring

17 18 21 22 23 26 27 29 35 36 39 39 41 42 43 46 52 66 66 68 70 82 83 83 84 136 138

139 140 141 143 144 145 164 179 180 180 183 184 184 191 192 196 201 203 204

(27)

Figure 8.13. Ordered multicast

Figure 8.14. Steps taken for RMP to multicast an ordered message Figure 9.1. A system

Figure 9.2. Fault, error, and failure Figure 9.3. The bathtub curve

Figure 9.4. Relationships between MTBF, MTTF, and MTTR Figure 9.5. Reconfigurable duplication architecture

Figure 9.6. Reliability block diagram of a series system Figure 9.7. The example reliability block diagram

Figure 9.8. Reliability block diagram of the parallel system Figure 9.9. Example reliability block diagram

Figure 9.10. Reduced reliability block diagram Figure 9.11. State diagram of a TRM system

Figure 9.12. Reduced state diagram of a TMR system

Figure 9.13. Reliability block diagram of a simple parallel system Figure 9.14. Impact of the fault coverage

Figure 9.15. Reliability comparison of TMR and a single module Figure 9.16. Deadlock

Figure 9.17. False deadlock

Figure 9.18. Distributed deadlock detection

Figure 9.19. The Primary-Backup Replication Scheme Figure 9.20. Hot replication implementations

Figure 9.21. The active replication scheme

Figure 9.22. The scenario of requests arriving in different orders Figure 9.23. The Primary-Peer Replication Scheme

Figure 10.1. Single (private) key encryption Figure 10.2. Key distribution server Figure 10.3. Public key encryption Figure 10.4. Packet filter in a router Figure 10.5. Internet firewall

Figure 10.6. A hierarchical model of a DDoS attack

Figure 10.7. Key methods used before making an effective DDoS attack Figure 10.8. Active defense cycle

Figure 11.1. The generic reactive system architecture Figure 11.2. A DMM agent

Figure 11.3. Sensors and actuators

Figure 11.4. Tunnelling multicast packets between subnets Figure 11.5. The generic sensor architecture

Figure 11.6. A distributed replication system

Figure 11.7. Replication manager and database server Figure 11.8. Using polling sensors

Figure 11.9. Using event sensors Figure 11.10. Using embedded DMMs

Figure 11.11. Using polling sensors for network partitioning Figure 11.12. Using event sensors for partition-tolerant applications Figure 12.1. Two-tier architecture of WBDB

Figure 12.2. Three-tier architecture of WBDB

Figure 12.3. Hybrid architecture of WBDB ( agent-based)

206 207 214 214 215 218 222 227 228 228 229 229 230 231 231 232 233 237 238 238 243 244 245 247 250 256 258 258 264 264 266 270 282 296 298 301 306 307 312 313 316 316 317 318 319 329 330 331

(28)

Figure 12.4. Generation 1 framework (CGI-based) of WBDB

Figure 12.5. Generation 2 framework (Client and Server-side JavaScript) of WBDB Figure 12.6. Generation 3 (JDBC-based) framework of WBDB

Figure 12.7. Servlet-based framework of WBDB Figure 12.8. XML–based two-tier framework of WBDB Figure 12.9. Mobile agent involved framework of WBDB Figure 12.10. Intelligent interactive framework of WBDB Figure 13.1. Example of Mobile applications

Figure 13.2. IP Tunneling

Figure 13.3. Operation of Mobile Node under Mobile IP Figure 13.4. Operation of Mobile IP on care-of address

Figure 13.5. Operation of Mobile IP on collocated care-of address

Figure 13.6. ICMP Router Advertisement and Mobility Agent Advertisement Extension Message

Figure 13.7. ICMP Router Solicitation Message

Figure 13.8. The mobility agents (either home or foreign) multicast Agent Advertisement

Figure 13.9. The message format of Registration Request and Mobile-Foreign Authentication Extension

Figure 13.10. The message format of Registration Reply Figure 13.11. Bi-directional Tunneled Multicast Method Figure 13.12. MMP topology and Mobile Connections Figure 13.13. Network Topology of the Simulation Figure 13.14. Message delivery delays

Figure 13.15. Number of delivered messages Figure 14.1. A distributed file system structure Figure 14.2. NFS structure

Figure 14.3. OMA Reference Architecture Figure 14.4. CORBA client and server Figure 14.5. CORBA architecture

Figure 14.6. Interface inheritance and implementation inheritance Figure 14.7. Distributed COM is built on top of DCE RPC Figure 14.8. DCOM security

Figure 14.9. DCOM binary specification

333 334 335 337 337 339 345 369 373 377 377 378 379 379 385 387 389 397 398 403 404 405 408 409 420 421 422 425 428 430 431

(29)

(30)

CHAPTER 1 OVERVIEW OF DISTRIBUTED NETWORK SYSTEMS

In this Chapter we outline the basic concepts of distributed systems and computer networks, such as their purposes, characteristics, advantages, and limitations, as well as their basic architectures and their applications.

1.1 Distributed Systems

A distributed system is a system consisting of a collection of autonomous machines connected by communication networks and equipped with software systems designed to produce an integrated and consistent computing environment.

Distributed systems enable people to cooperate and coordinate their activities more effectively and efficiently. The key purposes of the distributed systems can be represented by: resource sharing, openness, concurrency, scalability, fault-tolerance and transparency [Coulouris et al 1994].

Resource sharing. In a distributed system, the resources - hardware, software and data can be easily shared among users. For example, a printer can be shared among a group of users.

Openness. The openness of distributed systems is achieved by specifying the key software interface of the system and making it available to software developers so that the system can be extended in many ways.

Concurrency. The processing concurrency can be achieved by sending requests to multiple machines connected by networks at the same time.

Scalability. A distributed system running on a collection of a small number of machines can be easily extended to a large number of machines to increase the processing power.

Fault-tolerance. Machines connected by networks can be seen as redundant resources, a software system can be installed on multiple machines so that in the face of hardware faults or software failures, the faults or failures can be detected and tolerated by other machines.

Transparency. Distributed systems can provide many forms of transparency such as:

Location transparency, which allows local and remote information to be accessed in a unified way;

1)

(31)

Failure transparency, which enables the masking of failures automatically;

and

Replication transparency, which allows duplicating software/data on multiple machines invisibly.

2)

3)

Computing in the late 1990s has reached the state of Web-based distributed computing. A basis of this form of computing is distributed computing which is carried out on distributed computing systems. These systems comprise the following three fundamental components:

personal computers and powerful server computers, local and fast wide area networks, internet, and

systems, in particular distributed operating systems, and application software.

In this book we are interested in the last two issues of distributed computing systems: networks and system and application software.

With the flourishing of the Internet and the current quick development of e- commerce, it is very important in designing distributed systems to consider not only traditional applications but also the requirements of distributed computing based on the Internet.

1.2 Computer Networks 1.2.1 Network History

The following table ([Stallings 1998]) shows a brief networking history.

(32)

1.2.2 Network Architecture

The early success of the ARPANET (sponsored by the Advanced Research Projects Agency (ARPA) and developed during the late 1960s and early 1970s) and other networks, and the immediate commercial potential of packet switching, satellite, and local network technology made it apparent that computer networking was quickly becoming an important area of innovation and commerce. It was also apparent that to utilize the full potential of such computer networks, international standards would be required to ensure that any system could communicate with any other system anywhere in the world.

In 1978, a new subcommittee (SC16) was created by the International Organization for Standardization (ISO) Technical Committee 97 on Information Processing to develop standards for “open system interconnection (OSI)”. The term “open” was chosen to emphasize that by conforming to OSI standards, a system would be open to communication with any other system anywhere in the world obeying the same standards.

The OSI reference model is a seven-layer model for inter-process communication.

Its architecture is comprised of application, presentation, session, transport, network, data link and physical layers, and the corresponding protocols, as depicted in Table 1.2. The detailed descriptions of these layers are given in Chapter 4.

The early-developed ARPANET adopts another type of network architecture, i.e., four-layer architecture: application, transport, Internet, and network interface, as depicted in Table 1.3. The current Internet based on ARPANET uses this

(33)

architecture, which is also known as TCP/IP reference model. In this model, the network interface (or access) layer relies on the data link and physical layers of the network, and the application layer corresponds to application and presentation layers of the OSI model, since there is no session layer in the TCP/IP model. The detail of these layers is also given in Chapter 4.

1.2.3 Network Fault Tolerance

Network reliability refers to the reliability of the overall network to provide communication in the event of failure of a component or components in the network. The term network fault tolerance refers to how resilient the network is against the failure of a component.

Why fault tolerance in a networked world? A key indicator of today’s global business systems is the reliability and uptime [Grimshaw et al. 1999]. This concern is crucial for e-commerce sites and mission-critical business applications. Expensive and powerful servers and system components that are designed as stand-alone systems can be very reliable, but even an hour of downtime per month can be deadly to online-only businesses.

For example, server clusters are increasingly used in business and academia to combat the problems of reliability since they are relatively inexpensive and easy to build [Buyya 1999] [TBR 1998]. By having multiple network servers working together in a cluster and using redundant components such as more than one power supply and RAID hard drive subsystems, the overall system uptime in theory can approach 100 percent. However, server clusters are only a part of a chain that links business applications together. For example, to access an HTML page of a business web site, a user issues a request that travels from the user’s client machine, through a number of routers and firewalls and other network devices to reach the web site.

The web site then processes the request and returns the requested HTML page via the same or another chain of routers, firewalls and network devices. The strength of this chain, in terms of reliability and performance, will determine the success or failure of the business, but a chain is only as strong as its weakest link, and the longer the chain, the weaker it is in general. Intuitively, the following two ways can be used to make such a chain stronger: one is the use of redundancy (replication) and concurrency (parallelism) techniques, and the other is to increase the reliability of the weakest link of the chain.

(34)

A networked world faces a number of challenges in fault tolerance. In particular, Internet-connected resources have the following characteristics:

Unreliable communications;

Unreliable resources (computers, storage, software, etc.);

Highly heterogeneous environment;

Potentially very large amount of resources: scalability;

Potentially highly variable number of resources.

Communication network reliability depends on the sustainability of both hardware and software. It is possible that, depending on failure senario, a variety of network failures can last from a few seconds to days. Traditionally, such failures were primarily from hardware malfunctions that result in downtime (or “outage period”) of a network element (a node or a link). Thus, the emphasis was on the element- level network availability and, in turn, the determination of overall network availability. However, other types of major outages have received much attention in recent years. Such incidents include accidental fiber cable cut, natural disasters, and malicious attack (both hardware and software). These major failures need more than what is traditionally addressed through network availability.

These types of failures cannot be addressed by congestion control schemes alone because of their drastic impact on the network. Such failures can, for example, drop a significant number of existing network connections; thus, the network is required to have the ability to detect a fault and isolate it, and then either the network must reconnect the affected connections or the user may try to reconnect it (if the network does not have reconnect capability). At the same time, the network may not have enough capacity and capability to handle such a major simultaneous “reconnect”

phase. Likewise, because of a software and/or protocol error, the network may appear very congested to the user. Thus, network reliability nowadays encompasses more than what was traditionally addressed through network availability.

Basic techniques used in dealing with network failures include: retry (retransmission), complemented retry with correction, replication (e.g., dual bus), coding, special protocols (single handshake, double handshake, etc.), timing checks, rerouting, and retransmission with shift (intelligent retry), etc..

1.3 Protocols and QoS

Network software is arranged in a hierarchy of layers. Each layer presents an interface to the layers above it that extends the properties of the underlying communication system. One layer on one machine carries on a conversation with the same layer on another machine. The rules and conventions used in this conversation are collectively known as the protocol of this layer. Generally speaking, a protocol is an agreement between the communication parties on how communication is to proceed. The definition of a protocol has two important parts:

A specification of the sequence of messages that must be exchanged;

DISTRIBUTED NETWORK SYSTEMS

DISTRIBUTED NETWORK SYSTEMS

Network Theory and Applications

DISTRIBUTED NETWORK SYSTEMS

From Concepts to Implementations

Springer

Contents

Preface xvii

Acknowledgements xxi

Biography of Authors xxiii

Table of Figures xxv

Chapter 1 Overview of Distributed Network Systems 1

Chapter 2 Modelling for Distributed Network Systems: The Client-Server Model

15

Chapter 3 Communication Paradigms for Distributed

Network Systems 33

Chapter 4 Internetworking

65

Chapter 5 Interprocess Communication using Message Passing

Chapter 6 TCP/UDP Communication in Java

79

105

Chapter 7 Interprocess Communication using RPC

135

Chapter 8 Group Communications

Chapter 9 Reliability and Replication Techniques

175

213

Chapter 10 Security

255

Chapter 11 A Reactive System Architecture for Fault- Tolerant Computing

295

Chapter 12 Web-Based Databases

Chapter 13 Mobile Computing

325

369

Chapter 14 Distributed Network Systems: Case Studies

407

Chapter 15 Distributed Network Systems: Current Development

435

References Index

472

509

Preface

Acknowledgements

Biography of Authors

Table of Figures

CHAPTER 1 OVERVIEW OF DISTRIBUTED NETWORK SYSTEMS